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Abstract 



The subject of quantum computing brings together ideas from classical information theory, computer science, 
and quantum physics. This review aims to summarise not just quantum computing, but the whole subject of 
quantum information theory. Information can be identified as the most general thing which must propagate 
from a cause to an effect. It therefore has a hmdamcntally important role in the science of physics. However, 
the mathematical treatment of information, especially information processing, is quite recent, dating from the 
mid-twentieth century. This has meant that the full significance of information as a basic concept in physics is 
only now being discovered. This is especially true in quantum mechanics. The theory of quantum information 
and computing puts this significance on a firm footing, and has lead to some profound and exciting new 
insights into the natural world. Among these are the use of quantum states to permit the secure transmission of 
classical information (quantum cryptography) , the use of quantum entanglement to permit reliable transmission 
of quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible 
noise processes (quantum error correction), and the use of controlled quantum evolution for efficient computation 
(quantum computation). The common theme of all these insights is the use of quantum entanglement as a 
computational resource. 

It turns out that information theory and quantum mechanics fit together very well. In order to explain their rela- 
tionship, this review begins with an introduction to classical information theory and computer science, including 
Shannon's theorem, error correcting codes, Turing machines and computational complexity. The principles of 
quantum mechanics arc then outlined, and the EPR experiment described. The EPR-Bcll correlations, and 
quantum entanglement in general, form the essential new ingredient which distinguishes quantum from classical 
information theory, and, arguably, quantum from classical physics. 

Basic quantum information ideas are next outlined, including qubits and data compression, quantum gates, the 
'no cloning' property, and teleportation. Quantum cryptography is briefly sketched. The universal quantum 
computer is described, based on the Church- Turing Principle and a network model of computation. Algorithms 
for such a computer are discussed, especially those for finding the period of a function, and searching a random 
list. Such algorithms prove that a quantum computer of sufficiently precise construction is not only fundamen- 
tally different from any computer which can only manipulate classical information, hut can compute a small 
class of functions with greater efficiency. This implies that some important computational tasks are impossible 
for any device apart from a quantum computer. 

To build a universal quantum computer is well beyond the abilities of current technology. However, the principles 
of quantum information physics can be tested on smaller devices. The current experimental situation is reviewed, 
with emphasis on the linear ion trap, high-Q optical cavities, and nuclear magnetic resonance methods. These 
allow coherent control in a Hilbert space of eight dimensions (3 qubits), and should be extendable up to a 
thousand or more dimensions (10 qubits). Among other things, these systems will allow the feasibility of 
quantum computing to be assessed. In fact such experiments are so difficult that it seemed likely until recently 
that a practically useful quantum computer (requiring, say, 1000 qubits) was actually ruled out by considerations 
of experimental imprecision and the unavoidable coupling between any system and its environment. However, a 
further fundamental part of quantum information physics provides a solution to this impasse. This is quantum 
error correction (QEC). 

An introduction to quantum error correction is provided. The evolution of the quantum computer is restricted 
to a carefully chosen sub-space of its Hilbert space. Errors are almost certain to cause a departure from 
this sub-space. QEC provides a means to detect and undo such departures without upsetting the quantum 
computation. This achieves the apparently impossible, since the computation preserves quantum coherence 
even though during its course all the qubits in the computer will have relaxed spontaneously many times. 
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The review concludes with an outUne of the main features of quantum information physics, and avenues for 
future research. 

PACS 03.65.Bz, 89.70.+C 
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1 Introduction 

The science of physics seeks to ask, and find precise 
answers to, basic questions about why nature is as it 
is. Historically, the fundamental principles of physics 
have been concerned with questions such as "what are 
things made of?" and "why do things move as they 
do?" In his Principia, Newton gave very wide-ranging 
answers to some of these questions. By showing that 
the same mathamatical equations could describe the 
motions of everyday objects and of planets, he showed 
that an everyday object such as a tea pot is made of 
essentially the same sort of stuff as a planet: the mo- 
tions of both can be described in terms of their mass 
and the forces acting on them. Nowadays we would 
say that both move in such a way as to conserve en- 
ergy and momentum. In this way, physics allows us to 
abstract from nature concepts such as energy or mo- 
mentum which always obey fixed equations, although 
the same energy might be expressed in many different 
ways: for example, an electron in the large electron- 
positron collider at CERN, Geneva, can have the same 
kinetic energy as a slug on a lettuce leaf. 

Another thing which can be expressed in many dif- 
ferent ways is information. For example, the two 
statements "the quantum computer is very interest- 
ing" and "I'ordinateur quantique est tres interessant" 
have something in common, although they share no 
words. The thing they have in common is their in- 
formation content. Essentially the same information 
could be expressed in many other ways, for example 
by substituting numbers for letters in a scheme STich 
as a ^ 97, 6 ^ 98, c 99 and so on, in which case 
the english version of the above statement becomes 116 
104 101 32 113 117 97 110 116 117 109 ... . It is very 
significant that information can be expressed in differ- 
ent ways without losing its essential nature, since this 
leads to the possibility of the automatic manipulation 
of information: a machine need only be able to ma- 
nipulate quite simple things like integers in order to 
do surprisingly powerful information processing, from 
document preparation to differential calculus, even to 
translating between human languages. We are familiar 
with this now, because of the ubiquitous computer, but 
even fifty years ago such a widespread significance of 
automated information processing was not forseen. 

However, there is one thing that all ways of express- 



ing information must have in common: they all use 
real physical things to do the job. Spoken words are 
conveyed by air pressure fluctuations, written ones by 
arrangements of ink molecules on paper, even thoughts 
depend on neurons (Landauer 1991). The rallying cry 
of the information physicist is "no information without 
physical representation!" Conversely, the fact that in- 
formation is insensitive to exactly how it is expressed, 
and can be freely translated from one form to another, 
makes it an obvious candidate for a fundamentally im- 
portant role in physics, like energy and momentum and 
other such abstractions. However, until the second 
half of this century, the precise mathematical treat- 
ment of information, especially information process- 
ing, was undiscovered, so the significance of informa- 
tion in physics was only hinted at in concepts such 
as entropy in thermodynamics. It now appears that 
information may have a much deeper significance. His- 
torically, much of fundauKnital physic;s has been con- 
cerned with discovering the fundamental particles of 
nature and the equations which describe their motions 
and interactions. It now appears that a different pro- 
gramme may be equally important: to discover the 
ways that nature allows, and prevents, information to 
be expressed and manipulated, rather than particles to 
move. For example, the best way to state exactly what 
can and cannot travel faster than light is to identify 
information as the speed-limited entity. In quantum 
mechanics, it is highly significant that the state vec- 
tor must not contain, whether explicitly or implicitly, 
more information than can meaningfully be associated 
with a given system. Among other things this produces 
the wavefunction symmetry requirements which lead to 
Bose Einstein and Fermi Dirac statistics, the periodic 
structure of atoms, and so on. 

The programme to re-investigate the fundamental prin- 
ciples of physics from the standpoint of information 
theory is still in its infancy. However, it already ap- 
pears to be highly fruitful, and it is this ambitious pro- 
gramme that I aim to summarise. 

Historically, the concept of information in physics does 
not have a clear-cut origin. An important thread can 
be traced if we consider the paradox of Maxwell's de- 
mon of 1871 (fig. 1) (see also Brillouin 1956). Re- 
call that Maxwell's demon is a creature that opens 
and closes a trap door between two compartments of 
a chamber containing gas, and pursues the subversive 
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policy of only opening the door when fast molecules 
approach it from the right, or slow ones from the left. 
In this way the demon establishes a temperature dif- 
ference between the two compartments without doing 
any work, in violation of the second law of thermody- 
namics, and consequently permitting a host of contra- 
dictions. 

A number of attempts were made to exorcise Maxwell's 
demon (see Bennett 1987), such as arguments that the 
demon cannot gather information without doing work, 
or without disturbing (and thus heating) the gas, both 
of which are untrue. Some were tempted to propose 
that the 2nd law of thermodynamics could indeed be 
violated by the actions of an "intelligent being." It 
was not until 1929 that Leo Szilard made progress by 
reducing the problem to its essential components, in 
which the demon need merely identify whether a sin- 
gle molecule is to the right or left of a sliding partition, 
and its action allows a simple heat engine, called Szi- 
lard's engine, to be run. Szilard still had not solved the 
problem, since his analysis was unclear about whether 
or not the act of measurement, whereby the demon 
learns whether the molecule is to the left or the right, 
must involve an increase in entropy. 

A definitive and clear answer was not forthcoming, sur- 
prisingly, until a further fifty years had passed. In the 
intermediate years digital computers were developed, 
and the physical implications of information gathering 
and processing were carefully considered. The ther- 
modynamic costs of elementary information manipu- 
lations were analysed by Landauer and others during 
the 1960s (Landauer 1961, Keyes and Landauer 1970), 
and those of general computations by Bennett, Fred- 
kin, Toffoli and others during the 1970s (Bennett 1973, 
Toffoh 1980, Fredkin and Toffoh 1982). It was found 
that almost anything can in principle be done in a 
reversible manner, i.e. with no entropy cost at all 
(Bennett and Landauer 1985). Bennett (1982) made 
explicit the relation between this work and Maxwell's 
paradox by proposing that the demon can indeed learn 
where the molecule is in Szilard's engine without doing 
any work or increasing any entropy in the environment, 
and so obtain useful work during one stroke of the en- 
gine. However, the information about the molecule's 
location must then be present in the demon's memory 
(fig. 1). As more and more strokes are performed, more 
and more information gathers in the demon's memory. 



To complete a thermodynamic cycle, the demon must 
erase its memory, and it is during this erasure opera- 
tion that we identify an increase in entropy in the en- 
vironment, as required by the 2nd law. This completes 
the essential physics of Maxwell's demon; further sub- 
tleties are discussed by Zurek (1989), Caves (1990), and 
Caves, Unruh and Zurek (1990). 

The thread we just followed was instructive, but to 
provide a complete history of ideas relevent to quan- 
tum computing is a formidable task. Our subject 
brings together what are arguably two of the great- 
est revolutions in twentieth-century science, namely 
quantum mechanics and information science (includ- 
ing computer science) . The relationship between these 
two giants is illustrated in fig. 2. 

Classical information theory is founded on the defi- 
nition of information. A warning is in order here. 
Whereas the theory tries to capture much of the normal 
meaning of the term 'information', it can no more do 
justice to the full richness of that term in everyday lan- 
guage than particle physics can encapsulate the every- 
day meaning of 'charm'. 'Information' for us will be an 
abstract term, defined in detail in section 2.1. Much of 



information theory dates back to seminal work of Shan- 
non in the 1940's (Slepian 1974). The observation that 
information can be translated from one form to another 
is encapsulated and quantified in Shannon's noiseless 
coding theorem (1948), which quantifies the resources 
needed to store or transmit a given body of informa- 
tion. Shannon also considered the fundamentally im- 
portant problem of communication in the presence of 
noise, and established Shannon's main theorem (sec- 
tion 2A ) which is the central result of classical informa- 
tion theory. Error-free communication even in the pres- 
ence of noise is achieved by means of 'error-correcting 
codes', and their study is a branch of mathematics in 
its own right. Indeed, the journal IEEE Transactions 
on Information Theory is almost totally taken up with 
the discovery and analysis of error-correction by cod- 
ing. Pioneering work in this area was done by Golay 
(1949) and Hamming (1950). 

The foundations of computer science were formulated 
at roughly the same time as Shannon's information 
theory, and this is no coincidence. The father of com- 
puter science is arguably Alan Turing (1912-1954), and 
its prophet is Charles Babbage (1791-1871). Babbage 



6 



conceived of most of the essential elements of a mod- 
ern computer, though in his day there was not the 
technology available to implement his ideas. A cen- 
tury passed before Babbage's Analytical Engine was 
improved upon when Turing described the Universal 
Turing Machine in the mid 1930s. Turing's genius (see 
Hodges 1983) was to clarify exactly what a calculat- 
ing machine might be capable of, and to emphasise the 
role of programming, i.e. software, even more than 
Babbage had done. The giants on whose shoulders 
Turing stood in order to get a better view were chiefly 
the mathematicians David Hilbert and Kurt Godel. 
Hilbert had emphasised between the 1890s and 1930s 
the importance of asking fundamental questions about 
the nature of mathematics. Instead of asking "is this 
mathematical proposition true?" Hilbert wanted to ask 
"is it the case that every mathematical proposition can 
in principle be proved or disproved?" This was un- 
known, but Hubert's feeling, and that of most mathe- 
maticians, was that mathematics was indeed complete, 
so that conjectures such as Goldbach's (that every even 
number can be written as the sum of two primes) could 
be proved or disproved somehow, although the logical 
steps might be as yet undiscovered. 

Godel destroyed this hope by establishing the existence 
of mathematical propositions which were undecidable, 
meaning that they could be neither proved nor dis- 
proved. The next interesting question was whether it 
would be easy to identify such propositions. Progress 
in mathematics had always relied on the use of cre- 
ative imagination, yet with hindsight mathematical 
proofs appear to be automatic, each step following in- 
evitably from the one before. Hilbert asked whether 
this 'inevitable' quality could be captured by a 'me- 
chanical' process. In other words, was there a universal 
mathematical method, which would establish the truth 
or otherwise of every mathematical assertion? After 
Godel, Hubert's problem was re-phrased into that of 
establishing decidability rather than truth, and this is 
what Turing sought to address. 

In the words of Newman, Turing's bold innovation was 
to introduce 'paper tape' into symbolic logic. In the 
search for an automatic process by which mathemat- 
ical questions could be decided, Turing envisaged a 
thoroughly mechanical device, in fact a kind of glo- 
rified typewriter (fig. 7). The importance of the Tur- 
ing machine (Turing 1936) arises from the fact that it 



is sufficiently complicated to address highly sophisti- 
cated mathematical questions, but sufficiently simple 
to be subject to detailed analysis. Turing used his 
machine as a theoretical construct to show that the 
assumed existence of a mechanical means to establish 
decidability leads to a contradiction (see section [3?^ ). 
In other words, he was initially concerned with quite 
abstract mathematics rather than practical computa- 
tion. However, by seriously establishing the idea of 
automating abstract mathematical proofs rather than 
merely arithmatic, Turing greatly stimulated the de- 
velopment of general purpose information processing. 
This was in the days when a "computer" was a person 
doing mathematics. 

Modern computers are neither Turing machines nor 
Babbage engines, though they are based on broadly 
similar principles, and their computational power is 
equivalent (in a technical sense) to that of a Turing 
machine. I will not trace their development here, since 
although this is a wonderful story, it would take too 
long to do justice to the many people involved. Let 
us just remark that all of this development represents 
a great improvement in speed and size, but does not 
involve any change in the essential idea of what a com- 
puter is, or how it operates. Quantum mechanics raises 
the possibility of such a change, however. 

Quantum mechanics is the mathematical structure 
which embraces, in principle, the whole of physics. We 
will not be directly concerned with gravity, high ve- 
locities, or exotic elementary particles, so the standard 
non-relativistic quantum mechanics will suffice. The 
significant feature of quantum theory for our purpose 
is not the precise details of the equations of motion, but 
the fact that they treat quantum amplitudes, or state 
vectors in a Hilbert space, rather than classical vari- 
ables. It is this that allows new types of information 
and computing. 

There is a parallel between Hilbert 's questions about 
mathematics and the questions we seek to pose in quan- 
tum information theory. Before Hilbert, almost all 
mathematical work had been concerned with estab- 
lishing or refuting particular hypotheses, but Hilbert 
wanted to ask what general type of hypothesis was 
even amenable to mathematical proof. Similarly, most 
research in quantum physics has been concerned with 
studying the evolution of specific physical systems, but 
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we want to ask what general type of evolution is even 
conceivable under quantum mechanical rules. 

The first deep insight into quantum information the- 
ory came with Bell's 1964 analysis of the paradoxical 
thought-experiment proposed by Einstein, Podolsky 
and Rosen (EPR) in 1935. Bell's inequality draws at- 
tention to the importance of correlations between sepa- 
rated quantum systems which have interacted (directly 
or indirectly) in the past, but which no longer influence 
one another. In essence his argument shows that the 
degree of correlation which can be present in such sys- 
tems exceeds that which could be predicted on the basis 
of any law of physics which describes particles in terms 
of classical variables rather than quantum states. Bell's 
argument was clarified by Bohm (1951, also Bohm and 
Aharonov 1957) and by Clauser, Holt, Horne and Shi- 
mony (1969), and experimental tests were carried out 
in the 1970s (see CI md Shimony (1978) and ref- 

erences therein). Improvements in such experiments 
are largely concerned with preventing the possibility 
of any interaction between the separated quantum sys- 
tems, and a significant step forward was made in the 
experiment of Aspect, Dalibard and Roger (1982), (see 
also Aspect 1991) since in their work any purported in- 
teraction would have either to travel faster than light, 
or possess other almost equally implausible qualities. 

The next link between quantum mechanics and infor- 
mation theory came about when it was realised that 
simple properties of quantum systems, such as the un- 
avoidable disturbance involved in measurement, could 
be put to practical use, in quantum cryptography (Wies- 
ner 1983, Bennett et. al. 1982, Bennett and Brassard 
1984; for a recent review see Brassard and Crepeau 
1996). Quantum cryptography covers several ideas, of 
which the most firmly established is quantum key dis- 
tribution. This is an ingenious method in which trans- 
mitted quantum states are used to perform a very par- 
ticular communication task: to establish at two sepa- 
rated locations a pair of identical, but otherwise ran- 
dom, sequences of binary digits, without allowing any 
third party to learn the sequence. This is very useful 
because such a random sequence can be used as a cryp- 
tographic key to permit secure communication. The 
significant feature is that the principles of quantum 
mechanics guarantee a type of conservation of quan- 
tum information, so that if the necessary quantum in- 
formation arrives at the parties wishing to establish 



a random key, they can be sure it has not gone else- 
where, such as to a spy. Thus the whole problem of 
compromised keys, which fills the annals of espionage, 
is avoided by taking advantage of the structure of the 
natural world. 

While quantum cryptography was being analysed and 
demonstrated, the quantum computer was undergoing 
a quiet birth. Since quantum mechanics underlies the 
behaviour of all systems, including those we call classi- 
cal ( "even a screwdriver is quantum mechanical" , Lan- 
dauer (1995)), it was not obvious how to conceive of 
a distinctively quantum mechanical computer, i.e. one 
which did not merely reproduce the action of a classical 
Turing machine. Obviously it is not sufficient merely 
to identify a quantum mechanical system whose evolu- 
tion could be interpreted as a computation; one must 
prove a much strongc;r result than this. Conversely, we 
know that classical computers can simulate, by their 
computations, the evolution of any quantum system 
. . . with one reservation: no classical process will allow 
one to prepare separated systems whose correlations 
break the Bell inequality. It appears from this that the 
EPR-Bell correlations are the quintessential quantum- 
mechanical property (Feynman 1982). 

In order to think about computation from a quantum- 
mechanical point of view, the first ideas involved con- 
verting the action of a Turing machine into an equiv- 
alent reversible process, and then inventing a Hamil- 
tonian which would cause a quantum system to evolve 
in a way which mimicked a reversible Turing machine. 
This depended on the work of Bennett (1973; see also 
Lcccrf 1963) who had shown that a universal classical 
computing machine (such as Turing's) could be made 
reversible while retaining its simplicity. Benioff (1980, 
1982) and others proposed such Turing-like Hamiltoni- 
ans in the early 1980s. Although Benioff 's ideas did not 
allow the full analysis of quantum computation, they 
showed that unitary quantum evolution is at least as 
powerful computationally as a classical computer. 

A different approach was taken by Feynman (1982, 
1986) who considered the possibility not of univer- 
sal computation, but of universal simulation — i.e. a 
purpose-built quantum system which could simulate 
the physical behaviour of any other. Clearly, such a 
simulator would be a universal computer too, since 
any computer must be a physical system. Feynman 
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gave arguments which suggested that quantum evolu- 
tion could be used to compute certain problems more 
efficiently than any classical computer, but his device 
was not sufficiently specified to be called a computer, 
since he assumed that any interaction between adjacent 
two-state systems could be 'ordered', without saying 
how. 

In 1985 an important step forward was taken by 
Deutsch. Deutsch's proposal is widely considered to 
represent the first blueprint for a quantum computer, 
in that it is sufficiently specific and simple to allow real 
machines to be contemplated, but sufficiently versa- 
tile to be a universal quantum simulator, though both 
points are debatable. Deutsch's system is essentially a 
line of two-state systems, and looks more like a regis- 
ter machine than a Turing machine (both are universal 
classical computing machines). Deutsch proved that 
if the two-state systems could be made to evolve by 
means of a specific small set of simple operations, then 
any unitary evolution could be produced, and there- 
fore the evolution could be made to simulate that of 
any physical system. He also discussed how to pro- 
duce Turing-like behaviour using the same ideas. 

Deutsch's simple operations are now called quantum 
'gates', since they play a role analogous to that of bi- 
nary logic gates in classical computers. Various authors 
have investigated the minimal class of gates which are 
sufhcient for quantum computation. 

The two questionable aspects of Deutsch's proposal are 
its efficiency and realisability. The question of effi- 
ciency is absolutely fundamental in computer science, 
and on it the concept of 'universality' turns. A uni- 
versal computer is one that not only can reproduce 
(i.e. simulate) the action of any other, but can do so 
without running too slowly. The 'too slowly' here is 
defined in terms of the number of computational steps 
required: this number must not increase exponentially 
with the size of the input (the precise meaning will be 
explained in section ^j] ). Deutsch's simulator is not 
universal in this strict sense, though it was shown to 
be efficient for simulating a wide class of quantum sys- 
tems by Lloyd (1996). However, Deutsch's work has es- 
tablished the concepts of quantum networks (Deutsch 
1989) and quantum logic gates, which are extremely 
important in that they allow us to think clearly about 
quantum computation. 



In the early 1990's several authors (Deutsch and Jozsa 
1992, Bcrthiaume and Brassard 1992, Bernstein and 
Vazirani 1993) sought computational tasks which could 
be solved by a quantum computer more efRciently than 
any classical computer. Such a quantum algorithm 
would play a conceptual role similar to that of Bell's 
inequality, in defining something of the essential nature 
of quantum mechanics. Initially only very small differ- 
ences in performance were found, in which quantum 
mechanics permitted an answer to be found with cer- 
tainty, as long as the quantum system was noise-free, 
where a probabilistic classical computer could achieve 
an answer 'only' with high probability. An important 
advance was made by Simon (1994), who described 
an efficient quantum algorithm for a (somewhat ab- 
stract) problem for which no efficient solution was pos- 
sible classically, even by probabilistic methods. This 
inspired Shor (1994) who astonished the community 
by describing an algorithm which was not only efficient 
on a quantum computer, but also addressed a central 
problem in computer science: that of factorising large 
integers. 

Shor discussed both factorisation and discrete log- 
arithms, making use of a quantum Fourier trans- 
form method discovered by Coppersmith (1994) and 
Deutsch. Further important quantum algorithms were 
discovered by Grover (1997) and Kitaev (1995). 

Just as with classical computation and information the- 
ory, once theoretical ideas about computation had got 
under way, an effort was made to establish the essential 
nature of quantum information — the task analogous to 
Shannon's work. The difficulty here can be seen by 
considering the simplest quantum system, a two-state 
system such as a spin half in a magnetic field. The 
quantum state of a spin is a continuous quantity de- 
fined by two real numbers, so in principle it can store 
an infinite amount of classical information. However, 
a measurement of a spin will only provide a single two- 
valued answer (spin up/spin down) — there is no way to 
gain access to the infinite information which appears 
to be there, therefore it is incorrect to consider the 
information content in those terms. This is reminis- 
cent of the renormalisation problem in quantum elec- 
trodynamics. How much information can a two-state 
quantum system store, then? The answer, provided by 
Jozsa and Schumacher (1994) and Schumacher (1995), 
is one two-state system 's worth). Of course Schumacher 
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and Jozsa did more than propose this simple answer, 
rather they showed that the two-state system plays the 
role in quantum information theory analogous to that 
of the bit in classical information theory, in that the 
quantum information content of any quantum system 
can be meaningfully measured as the minimum num- 
ber of two-state systems, now called quantum bits or 
qubits, which would be needed to store or transmit the 
system's state with high accuracy. 

Let us return to the question of realisability of quan- 
tum computation. It is an elementary, but fundamen- 
tally important, observation that the quantum inter- 
ference effects which permit algorithms such as Shor's 
are extremely fragile: the quantum computer is ultra- 
sensitive to experimental noise and impression. It is 
not true that early workers were unaware of this diffi- 
culty, rather their first aim was to establish whether a 
quantum computer had any fundamental significance 
at all. Armed with Shor's algorithm, it now appears 
that such a fundamental significance is established, by 
the following argument: cither nature does allow a 
device to be run with sufficient precision to perform 
Shor's algorithm for large integers (greater than, say, a 
googol, lO^"*^), or there are fundamental natural limits 
to precision in real systems. Both eventualities repre- 
sent an important insight into the laws of nature. 

At this point, ideas of quantum information and quan- 
tum computing come together. For, a quantum com- 
puter can be made much less sensitive to noise by 
means of a new idea which comes directly from the 
marriage of quantum mechanics with classical infor- 
mation theory, namely quantum error correction. Al- 
though the phrase 'error correction' is a natural one 
and was used with reference to quantum comput- 
ers prior to 1996, it was only in that year that two 
important papers, of Calderbank and Shor, and in- 
dependently Steane, established a general framework 
whereby quantum information processing can be used 
to combat a very wide class of noise processes in a 
properly designed quantum system. Much progress has 
since been made in generalising these ideas (Knill and 
Laflamme 1997, Ekert and Macchiavello 1996, Bennett 
et. al. 1996b, Gottesman 1996, Calderbank et. al. 
1997). An important development was the demonstra- 
tion by Shor (1996) and Kitaev (1996) that correction 
can be achieved even when the corrective operations 
are themselves imperfect. Such methods lead to a gen- 



eral concept of 'fault tolerant' computing, of which a 
helpful review is provided by Preskill (1997). 

If, as seems almost certain, quantum computation will 
only work in conjunction with quantum error correc- 
tion, it appears that the relationship between quantum 
information theory and quantum computers is even 
more intimate than that between Shannon's informa- 
tion theory and classical computers. Error correction 
does not in itself guarantee accurate quantum compu- 
tation, since it cannot combat all types of noise, but 
the fact that it is possible at all is a significant devel- 
opment. 

A computer which only exists on paper will not actu- 
ally perform any computations, and in the end the only 
way to resolve the issue of feasibility in quantum com- 
puter science is to build a quantum computer. To this 
end, a number of authors proposed computer designs 
based on Deutsch's idea, but with the physical details 
more fully worked out (Teich et. al. 1988, Lloyd 1993, 
Berman et. al. 1994, DiVincenco 19951)). The great 
challenge is to find a sufficiently complex system whose 
evolution is nevertheless both coherent (i.e. unitary) 
and controlable. It is not sufficient that only some as- 
pects of a system should be quantum mechanical, as in 
solid-state 'quantum dots', or that there is an implicit 
assumption of unfeasible precision or cooling, which is 
often the case for proposals using solid-state devices. 
Cirac and Zoller (1995) proposed the use of a linear ion 
trap, which was a significant improvement in feasibil- 
ity, since heroic efforts in the ion trapping community 
had already achieved the necessary precision and low 
temperature in experimental work, especially the group 
of Wineland who demonstrated cooling to the ground 
state of an ion trap in the same year (Diedrich et. al. 
1989, Monroe et. al. 1995). More recently, Gcrshen- 
feld and Chuang (1997) and Cory et. al. (1996,1997) 
have shown that nuclear magnetic resonance (NMR) 
techniques can be adapted to fulfill the requirements 
of quantum computation, making this approach also 
very promising. Other recent proposals of Privman et. 
al. (1997) and Loss and DiVincenzo (1997) may also 
be feasible. 

As things stand, no quantum computer has been built, 
nor looks likely to be built in the author's lifetime, if 
we measure it in terms of Shor's algorithm, and ask 
for factoring of large numbers. However, if we ask in- 
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stead for a device in whic;li quantum information ideas 
can be expfored, then only a few quantum bits are re- 
quired, and this will certainly be achieved in the near 
future. Simple two-bit operations have been carried 
out in many physics experiments, notably magnetic 
resonance, and work with three to ten qubits now seems 
feasible. Notable recent experiments in this regard are 
those of Brune et. al. (1994), Monroe et. al. (1995b), 
Turchette et. al. (1995) and Mattle et. al. (1996). 
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2 Classical information theory 

This and the next section will summarise the classical 
theory of information and computing. This is text- 
book material (Minsky 1967, Hamming 1986) but is 
included here since it forms a background to quantum 
information and computing, and the article is aimed at 
physicists to whom the ideas may be new. 



2.1 Measures of information 

The most basic problem in classical information the- 
ory is to obtain a measure of information, that is, of 
amount of information. Suppose I tell you the value of 
a number X. How much information have you gained? 
That will depend on what you already knew about X. 
For example, if you already knew X was equal to 2, 
you would learn nothing, no information, from my rev- 
elation. On the other hand, if previously your only 
knowledge was that X was given by the throw of a die, 
then to learn its value is to gain information. We have 
met here a basic paradoxical property, which is that 
information is often a measure of ignorance: the infor- 
mation content (or 'self-information') of X is defined 
to be the information you would gain if you learned the 
value of X. 

If X is a random variable which has value x with proba- 
bility p{x), then the information content of X is defined 
to be 

5({p(a;)}) = -^K^)log2p(^). (1) 

X 

Note that the logarithm is taken to base 2, and that 
S is always positive since probabilities are bounded by 
p{x) < 1. 5 is a function of the probability distribi- 
tion of values oi X . It is important to remember this, 
since in what follows we will adopt the standard prac- 
tice of using the notation S{X) for S{{p{x)}). It is 
understood that S{X) docs not mean a fimction of X , 
but rather the information content of the variable X. 
The quantity S{X) is also referred to as an entropy, 
for obvious reasons. 

If we already know that X = 2, then p{2) = 1 and 

there are no other terms in the sum, leading to 5 = 0, 
so X has no information content. If, on the other hand. 



X is given by the throw of a die, then p{x) = 1/6 for 
X e {l,2,3,4,5,6}soS' = -log2(l/6) ~ 2.58. If X can 
take N different values, then the information content 
(or entropy) of X is maximised when the probability 
distribution p is flat, with every p{x) = 1/N (for ex- 
ample a fair die yields S ~ 2.58, but a loaded die with 
p{6) = l/2,p(l •• - 5) = 1/10 yields S ~ 2.16). This is 
consistent with the requirement that the information 
(what we would gain if we learned X) is maximum 
when our prior knowledge of X is minimum. 

Thus the maximum information which could in princi- 
ple be stored by a variable which can take on dif- 
ferent values is \og2{N). The logarithms are taken to 
base 2 rather than some other base by convention. The 
choice dictates the unit of information: S{X) = 1 when 
X can take two values with equal probability. A two- 
valued or binary variable thus can contain one unit of 
information. This unit is called a hit. The two values 
of a bit are typically written as the binary digits and 
1. 

In the case of a binary variable, we can define p to be 
the probability that X = 1, then the probability that 
X = is I — p and the information can be written as 
a function of p alone: 

H{p) = -p log2 p-{l-p) log2 {1-p) (2) 

This function is called the entropy function, < 
Hip) < 1. 

In what follows, the subscript 2 will be dropped on 
logarithms, it is assumed that all logarithms are to 
base 2 unless otherwise indicated. 

The probability that Y = y given that X = x is written 
p{y\x). The conditional entropy S{Y\X) is defined by 

S{Y\X) = -^p(x)^p(2/k)logp(y|x) (3) 

X y 

= -^^P{x,y)\ogp{y\x) (4) 

X y 

where the second line is deduced using p{x, y) = 
p{x)p{y\x) (this is the probability that X = x and 
Y = y). By inspection of the definition, we sec that 
S(Y\X) is a measure of how much information on av- 
erage would remain in Y if we were to learn X. Note 
that S{Y\X) < S{Y) always and S{Y\X) ^ S{X\Y) 
usually. 
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The conditional entropy is important mainly as a 
stepping-stone to the next quantity, the mutual infor- 
mation, defined by 

= S{X)-S{X\Y) (6) 

From the definition, I{X : F) is a measure of how 
much X and Y contain information about each otherj^. 
If X and Y are independent then p{x, y) = p{x)p{y) 
so I{X : Y) = 0. The relationships between the basic 
measures of information are indicated in fig. 3. The 
reader may like to prove as an exercise that S{X,Y), 
the information content of X and Y (the information 
we would gain if, initially knowing neither, we learned 
the value of both X and Y) satisfies S{X, Y) = S{X) + 
SiY)-I{X : Y). 

Information can disappear, but it cannot spring spon- 
taneously from nowhere. This important fact finds 
mathematical expression in the data processing inequal- 
ity: 

xiX^Y^Z then I{X : Z) < I{X : Y). (7) 

The symbol X — > y — > Z means that X, Y and Z form 
a process (a Markov chain) in which Z depends on Y 
but not directly on X: p{x,y,z) = p{x)p{y\x)p{z\y) . 
The content of the data processing inequality is that 
the 'data processor' Y can pass on to Z no more infor- 
mation about X than it received. 



2.2 Data compression 

Having pulled the definition of information content, 
equation (|l|), out of a hat, our aim is now to prove 
that this is a good measure of information. It is not 
obvious at first sight even how to think about such a 
task. One of the main contributions of classical infor- 
mation theory is to provide useful ways to think about 
information. We will describe a simple situation in 
order to illustrate the methods. Let us suppose one 
person, traditionally called Alice, knows the value of 
X, and she wishes to communicate it to Bob. We re- 
strict ourselves to the simple case that X has only two 

^Many authors write I{X;Y) rather than I{X : Y). I prefer 
the latter since the symmetry of the colon reflects the fact that 
I{X : Y) = I(Y : X). 



possible values: either 'yes' or 'no'. We say that Alice 
is a 'source' with an 'alphabet' of two symbols. Alice 
communicates by sending binary digits (noughts and 
ones) to Bob. We will measure the information con- 
tent of X by counting how many bits Alice must send, 
on average, to allow Bob to learn X. Obviously, she 
could just send for 'no' and 1 for 'yes', giving a 'bit 
rate' of one bit per X value communicated. However, 
what if X were an essentially random variable, except 
that it is more likely to be 'no' than 'yes'? (think of 
the output of decisions from a grant funding body, for 
example). In this case, Alice can communicate more 
efficiently by adopting the following procedure. 

Let p be the probability that X = 1 and 1 — p be the 
probability that X — 0. Alice waits until n values of 
X are available to be sent, where n will be large. The 
mean number of ones in such a sequence of n values 
is np, and it is likely that the number of ones in any 
given sequence is close to this mean. Suppose np is 
an integer, then the probability of obtaining any given 
sequence containing np ones is 

p"P(l_p)"-»p ^ 2-"-f^(P^ (8) 

The reader should satisfy him or herself that the two 
sides of this equation are indeed equal: the right hand 
side hints at how the argument can be generalised. 
Such a sequence is called a typical sequence. To be 
specific, we define the set of typical sequences to be all 
sequences such that 

2-n(ff(p)+6) < p(sequence) < 2~"^"^p^-'^ (9) 

Now, it can be shown that the probability that Alice's 
n values actually form a typical sequence is greater 
than 1 — e, for sufficiently large n, no matter how small 
e is. This implies that Alice need not communicate 
n bits to Bob in order for him to learn n decisions. 
She need only tell Bob which typical sequence she has. 
They must agree together beforehand how the typical 
sequences are to be labelled: for example, they may 
agree to number them in order of increasing binary 
value. Alice just sends the label, not the sequence it- 
self. To deduce how well this works, it can be shown 
that the typical sequences all have equal probability, 
and there are 2"^^^' of them. To communicate one 
of 2"^^'') possibilities, clealy Alice must send nH{p) 
bits. Also, Alice cannot do better than this (i.e. send 
fewer bits) since the typical sequences are equiproba- 
ble: there is nothing to be gained by further manipu- 
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lating the information. Therefore, the information con- 
tent of each value of X in the original sequence must 
be H{p), which proves (^. 

The mathematical details skipped over in the above 
argument all stem from the law of large numbers, which 
states that, given arbitrarily small e, 5 

P{\'m-np\<ne)>l-6 (10) 

for sufficiently large n, where m is the number of ones 
obtained in a sequence of n values. For large enough n, 
the number of ones m will differ from the mean np by 
an amount arbitrarily small compared to n. For exam- 
ple, in our case the noughts and ones will be distributed 
according to the binomial distribution 

P{n,m) = C(n,m)p'"(l -p)"-" (11) 



where the Gaussian form is obtained in the limit 
n,np — > cxD, with the standard deviation a = 
^ np{l — p), and C(n, ra) — n!/m!(n — m)\. 

The above argument has already yielded a signifi- 
cant practical result associated with (P. This is that 
to communicate n values of X, we need only send 
nS{X) < n bits down a communication channel. This 
idea is referred to as data compression, and is also 
called Shannon's noiseless coding theorem. 

The typical sequences idea has given a means to calcu- 
late information content, but it is not the best way to 
compress information in practice, because Alice must 
wait for a large number of decisions to accumulate 
before she communicates anything to Bob. A better 
method is for Alice to accumulate a few decisions, say 
4, and communicate this as a single 'message' as best 
she can. Huffman derived an optimal method whereby 
Alice sends short strings to communicate the most 
likely messages, and longer ones to communicate the 
least likely messages, see table 1 for an example. The 
translation process is referred to as 'encoding' and 'de- 
coding' (fig. 4); this terminology does not imply any 
wish to keep information secret. 

For the case p = 1/4 Shannon's noiseless coding the- 
orem tells us that the best possible data compression 
technique would communicate each message of four X 
values by sending on average 477(1/4) ~ 3.245 bits. 



The Huffman code in table 1 gives on average 3.273 bits 
per message. This is quite close to the minimum, show- 
ing that practical methods like Huffman's are powerful. 

Data compression is a concept of great practical impor- 
tance. It is used in telecommunications, for example 
to compress the information required to convey tele- 
vision pictures, and data storage in computers. From 
the point of view of an engineer designing a commu- 
nication channel, data compression can appear mirac- 
ulous. Suppose we have set up a telephone link to a 
mountainous area, but the communication rate is not 
high enough to send, say, the pixels of a live video 
image. The old-style engineering option would be to 
replace the telephone link with a faster one, but infor- 
mation theory suggests instead the possibility of using 
the same link, but adding data processing at either end 
(data compression and decompression). It comes as a 
great surprise that the usefulness of a cable can thus 
be improved by tinkering with the information instead 
of the cable. 



2.3 The binary symmetric channel 

So far we have considered the case of communication 
down a perfect, i.e. noise-free channel. We have gained 
two main results of practical value: a measure of the 
best possible data compression (Shannon's noiseless 
coding theorem), and a practical method to compress 
data (Huffman coding) . We now turn to the important 
question of communication in the presence of noise. As 
in the last section, we will analyse the simplest case in 
order to illustrate principles which are in fact more 
general. 

Suppose we have a binary channel, i.e. one which al- 
lows Alice to send noughts and ones to Bob. The noise- 
free channel conveys 0^0 and 1^1, but a noisy 
channel might sometimes cause to become 1 and vice 
versa. There is an infinite variety of different types of 
noise. For example, the erroneous 'bit flip' —^ 1 might 
be just as likely as 1 ^ 0, or the channel might have 
a tendency to 'relax' towards 0, in which case 1 — > 
happens but 0^1 does not. Also, such errors might 
occur independently from bit to bit, or occur in bursts. 

A very important type of noise is one which affects 
different bits independently, and causes both 0^1 
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and 1-^0 errors. This is important because it captures 
the essential features of many processes encountered 
in realistic situations. If the two errors 0^1 and 
1 — > are equally likely, then the noisy channel is called 
a 'binary symmetric channel'. The binary symmetric 
channel has a single parameter, p, which is the error 
probability per bit sent. Suppose the message sent into 
the channel by Alice is X, and the noisy message which 
Bob receives is Y. Bob is then faced with the task of 
deducing X as best he can from Y. li X consists of a 
single bit, then Bob will make use of the conditional 
probabilities 

p{x — 0\y = 0) = p{x = l\y = 1) = 1 — p 
p{x = Q\y = 1) = p{x = l|y 0) p 

giving S{X\Y) = H{p) using equations (||) and (||). 
Therefore, from the definition (|) of mutual informa- 
tion, we have 



simple. From equations (^_3|) and ( |14| ) one may see 
that the answer is 



I{X : Y) = S{X) - H{p) 



(13) 



Clearly, the presence of noise in the channel limits the 
information about Alice's X contained in Bob's re- 
ceived Y. Also, because of the data processing inequal- 
ity, equation (|^), Bob cannot increase his information 
about X by manipulating Y. However, (^3|) shows that 
Alice and Bob can communicate better if S{X) is large. 
The general insight is that the information communi- 
cated depends both on the source and the properties 
of the channel. It would be useful to have a measure 
of the channel alone, to tell us how well it conveys in- 
formation. This quantity is called the capacity of the 
channel and it is defined to be the maximum possi- 
ble mutual information I{X : Y) between the input 
and output of the channel, maximised over all possible 
sources: 



Channel capacity C 



max I(X : Y) 

{p(x)} 



(14) 



Channel capacity is measured in units of 'bits out per 
symbol in' and for binary channels must lie between 
zero and one. 

It is all very well to have a definition, but ( p^ does 
not allow us to compare channels very easily, since we 
have to perform the maximisation over input strategies, 
which is non-trivial. To establish the capacity C{p) of 
the binary symmetric channel is a basic problem in 
information theory, but fortunately this case is quite 



C{p) = 1 - H{p), 



(15) 



obtained when S{X) = 1 (i.e. P{x = 0) = P{x = 1) 
1/2). 



2.4 Error-correcting codes 

So far we have investigated how much information gets 
through a noisy channel, and how much is lost. Alice 
cannot convey to Bob more information than C (p) per 
symbol communicated. However, suppose Bob is busy 
defusing a bomb and Alice is shouting from a distance 
which wire to cut : she will not say "the blue wire" just 
once, and hope that Bob heard correctly. She will re- 
peat the message many times, and Bob will wait until 
he is sure to have got it right. Thus error-free commu- 
nication can be achieved even over a noisy channel. In 
this example one obtains the benefit of reduced error 
rate at the sacrifice of reduced information rate. The 
next stage of our information theoretic programme is to 
identify more powerful techniques to circumvent noise 
(Hamming 1986, Hill 1986, Jones 1979, MacWiUiams 
and Sloane 1977). 

We will need the following concepts. The set {0, 1} is 
considered as a group (a Galois field GF(2)) where the 
operations -f,— , x,-^ are carried out modulo 2 (thus, 
1-1-1 = 0). An n-bit binary word is a vector of n 
components, for example Oil is the vector (0, 1, 1). A 
set of such vectors forms a vector space under addition, 
since for example Oil -I- 101 means (0, 1, 1) -|- (1, 0, 1) = 
(0-1-1, l-fO, l+l) = (1, 1, 0) = 110 by the standard rules 
of vector addition. This is equivalent to the exclusive- 
or operation carried out bitwise between the two binary 
words. 

The effect of noise on a word u can be expressed u — > 
u' — u + e, where the error vector e indicates which 
bits in u were flipped by the noise. For example, u = 
1001101 ^ u' = 1101110 can be expressed u' ^ u + 
0100011. An error correcting code C is a set of words 
such that 

u + e^v + f yu,v € C (uj^v), WeJ € E (16) 

where E is the set of errors correctable by C, which in- 
cludes the case of no error, e = 0. To use such a code, 
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Alice and Bob agree on which codeword u corresponds 
to which message, and AHce only ever sends codewords 
down the channel. Since the channel is noisy, Bob re- 
ceives not u but u + e. However, Bob can deduce u 
unambiguously from u + e since by condition ( p^ ) , no 
other codeword v sent by Alice could have caused Bob 
to receive u + e. 

An example error-correcting code is shown in the right- 
hand column of table 1. This is a [7,4,3] Hamming 
code, named after its discoverer. The notation [n,k,d] 
means that the codewords are n bits long, there are 
2*^ of them, and they all differ from each other in 
at least d places. Because of the latter feature, the 
condition ( p^ is satisfied for any error which affects 
at most one bit. In other words the set E of cor- 
rectable errors is {0000000,1000000,0100000,0010000, 
0001000,0000100,0000010, 0000001}. Note that E can 
have at most 2"^*^ members. The ratio k/n is called 
the rate of the code, since each block of n transmitted 
bits conveys k bits of information, thus k/n bits per 
bit. 

The parameter d is called the 'minimum distance' of 
the code, and is important when encoding for noise 
which affects successive bits independently, as in the 
binary symmetric channel. For, a code of minumum 
distance d can correct all errors affecting less than d/2 
bits of the transmitted codeword, and for independent 
noise this is the most likely set of errors. In fact, the 
probability that an n-bit word receives m errors is given 
by the binomial distribution (|Tl|), so if the code can 
correct more than the mean number of errors np, the 
correction is highly likely to succeed. 

The central result of classical information theory is that 
powerful error correcting codes exist: 

Shannon's theorem: If the rate k/n < C{p) 
and n is sufficiently large, there exists a bi- 
nary code allowing transmission with an ar- 
bitrarily small error probability. 

The error probability here is the probability that an 
uncorrectable error occurs, causing Bob to misinter- 
pret the received word. Shannon's theorem is highly 
surprising, since it implies that it is not necessary to en- 
gineer very low-noise communication channels, an ex- 
pensive and difficult task. Instead, we can compensate 



noise by error correction coding and decoding, that is, 
by information processing! The meaning of Shannon's 
theorem is illustrated by fig. 5. 

The main problem of coding theory is to identify codes 
with large rate k/n and large distance d. These two 
conditions are mutually incompatible, so a compromise 
is needed. The problem is notoriously difficult and has 
no general solution. To make connection with quan- 
tum error correction, we will need to mention one im- 
portant concept, that of the parity check matrix. An 
error correcting code is called linear if it is closed under 
addition, i.e. u + v Cz C Vu, w G C Such a code is com- 
pletely specified by its parity check matrix H , which is 
a set of (n — k) linearly independent n-bit words sat- 
isfying H ■ u ~ yu (z C . The important property is 
encapsulated by the following equation: 

H ■ (u + e) = {H -u) + {H ■ e) ^ H -e. (17) 

This states that if Bob evaluates H ■ u' for his noisy re- 
ceived word w' = M -|- e, he will obtain the same answer 
H ■ e, no matter what word u Alice sent him! If this 
evaluation were done automatically. Bob could learn 
H ■ e, called the error syndrome, without learning u. If 
Bob can deduce the error e from H ■ e, which one can 
show is possible for all correctable errors, then he can 
correct the message (by subtracting e from it) without 
ever learning what it was! In quantum error correc- 
tion, this is the origin of the reason one can correct a 
quantum state without disturbing it. 

3 Classical theory of computa- 
tion 

We now turn to the theory of computation. This is 
mostly concerned with the questions "what is com- 
putable?" and "what resources are necessary?" 

The fundamental resources required for computing are 
a means to store and to manipulate symbols. The im- 
portant questions are such things as how complicated 
must the symbols be, how many will we need, how com- 
plicated must the manipulations be, and how many of 
them will we need? 

The general insight is that computation is deemed hard 
or inefficient if the amount of resources required rises 
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exponentially with a measure of the size of the prob- 
lem to be addressed. The size of the problem is given 
by the amount of information required to specify the 
problem. Applying this idea at the most basic level, we 
find that a computer must be able to manipulate bi- 
nary symbols, not just unary symbols]^, otherwise the 
number of memory locations needed would grow ex- 
ponentially with the amount of information to be ma- 
nipulated. On the other hand, it is not necessary to 
work in decimal notation (10 symbols) or any other 
notation with an 'alphabet' of more than two symbols. 
This greatly simplifies computer design and analysis. 

To manipulate n binary symbols, it is not necessary 
to manipulate them all at once, since it can be shown 
that any transformation can be brought about by ma- 
nipulating the binary symbols one at a time or in pairs. 
A binary 'logic gate' takes two bits x, y as inputs, and 
calculates a function f{x,y). Since / can be or 1, 
and there are four possible inputs, there are 16 possi- 
ble functions /. This set of 16 different logic gates is 
called a 'universal set', since by combining such gates 
in series, any transformation of n bits can be carried 
out. Futhermore, the action of some of the 16 gates 
can be reproduced by combining others, so we do not 
need all 16, and in fact only one, the nand gate, is 
necessary (nand is not AND, for which the output is 
if and only if both inputs are 1). 

By concatenating logic gates, we can manipulate n-bit 
symbols (see fig. 6). This general approach is called 
the network model of computation, and is useful for 
our purposes because it suggests the model of quan- 
tum computation which is currently most feasible ex- 
perimentally. In this model, the essential components 
of a computer are a set of bits, many copies of the 
universal logic gate, and connecting wires. 



3.1 Universal computer; Turing ma- 
chine 

The word 'universal' has a further significance in rela- 
tion to computers. Turing showed that it is possible to 
construct a universal computeT , which can simulate the 
action of any other, in the following sense. Let us write 

^ Unary notation has a single symbol, 1. The positive integers 
are written 1,11,111,1111,. . . 



T{x) for the output of a Turing machine T (fig. 7) act- 
ing on input tape x. Now, a Turing machine can be 
completely specified by writing down how it responds 
to and 1 on the input tape, for every possible inter- 
nal configuration of the machine (of which there are a 
finite number) . This specification can itself be written 
as a binary number d[T]. Turing showed that there 
exists a machine U, called a universal Turing machine, 
with the properties 

U{d[Tlx) =T{x) (18) 

and the number of steps taken by U to simulate each 
step of T is only a polynomial (not exponential) func- 
tion of the length of d[T] . In other words, if we provide 
U with an input tape containing both a description of 
T and the input x, then U will compute the same func- 
tion as T would have done, for any machine T, without 
an exponential slow-down. 

To complete the argument, it can be shown that other 
models of computation, such as the network model, are 
computationally equivalent to the Turing model: they 
permit the same functions to be computed, with the 
same computational efficiency (see next section) . Thus 
the concept of the univeral machine establishes that a 
certain finite degree of complexity of construction is 
sufficient to allow very general information processing. 
This is the fundamental result of computer science. In- 
deed, the power of the Turing machine and its cousins is 
so great that Church (1936) and Turing (1936) framed 
the "Church- Turing thesis," to the effect that 

Every function 'which would naturally be regarded as 
computable' can be computed by the universal Turing 
machine. 

This thesis is unproven, but has survived many at- 
tempts to find a counterexample, making it a very 
powerful result. To it we owe the versatility of the 
modern general-purpose computer, since 'computable 
functions' include tasks such as word processing, pro- 
cess control, and so on. The quantum computer, to 
be described in section |^ will throw new light on this 
central thesis. 
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3.2 Computational complexity 

Once we have established the idea of a universal com- 
puter, computational tasks can be classified in terms 
of their difficulty in the following manner. A given al- 
gorithm is deemed to address not just one instance of 
a problem, such as "find the square of 237," but one 
class of problem, such as "given x, find its square." The 
amount of information given to the computer in order 
to specify the problem is L = log x, i.e. the number of 
bits needed to store the value of x. The computational 
complexity of the problem is determined by the num- 
ber of steps 8 a Turing machine must make in order to 
complete any algorithmic method to solve the problem. 
In the network model, the complexity is determined by 
the number of logic gates required. If an algorithm ex- 
ists with s given by any polynomial function of L (eg 
s oc -\- L) then the problem is deemed tractable and 
is placed in the complexity class "p" . If s rises expo- 
nentially with I (eg s (X 2^ = x) then the problem is 
hard and is in another complexity class. It is often eas- 
ier to verify a solution, that is, to test whether or not 
it is correct, than to find one. The class "np" is the set 
of problems for which solutions can be verified in poly- 
nomial time. Obviously P e NP, and one would guess 
that there arc problems in np which arc not in p, (i.e. 
NP ^ p) though surprisingly the latter has never been 
proved, since it is very hard to rule out the possible 
existence of as yet undiscovered algorithms. However, 
the important point is that the membership of these 
classes does not depend on the model of computation, 
i.e. the physical realisation of the computer, since the 
Turing machine can simulate any other computer with 
only a polynomial, rather than exponential slow-down. 

An important example of an intractable problem is 
that of factorisation: given a composite (i.e. non- 
prime) number x, the task is to find one of its fac- 
tors. If X is even, or a multiple of any small number, 
then it is easy to find a factor. The interesting case is 
when the prime factors of x arc all themselves large. 
In this case there is no known simple method. The 
best known method, the number field sieve (Menezes 
et. al. 1997) requires a number of computational steps 
of order s ~ exp(2L^/^(log L)^/^) where L = \nx. By 
devoting a substantial machine network to this task, 
one can today factor a number of 130 decimal digits 
(Crandall 1997), i.e. L ~ 300, giving s ~ 10^^. This is 
time-consuming but possible (for example 42 days at 



10^^ operations per second). However, if we double L, s 
increases to ^ 10^^, so now the problem is intractable: 
it would take a million years with current technology, 
or would require computers running a million times 
faster than current ones. The lesson is an important 
one: a computationally 'hard' problem is one which in 
practice is not merely difficult but impossible to solve. 

The factorisation problem has acquired great practical 
importance because it is at the heart of widely used 
cyptographic systems such as that of Rivest, Shamir 
and Adleman (1979) (see Hellman 1979). For, given a 
message M (in the form of a long binary number), it is 
easy to calculate an encrypted version E = mod c 
where s and c are well-chosen large integers which can 
be made public. To decrypt the message, the receiver 
calculates mod c which is equal to M for a value of 
t which can be quickly deduced from s and the factors 
of c (Schroeder 1984). In practice c = pci \s chosen to 
be the product of two large primes p, q known only to 
the user who published c, so only that user can read 
the messages unless someone; manages to factorise c. 
It is a very useful feature that no secret keys need be 
distributed in such a system: the 'key' c, s allowing 
encryption is public knowledge. 



3.3 Uncomputable functions 

There is an even stronger way in which a task may be 
impossible for a computer. In the quest to solve some 
problem, we could 'live with' a slow algorithm, but 
what if one does not exist at all? Such problems are 
termed uncomputable. The most important example is 
the "halting problem", a rather beautiful result. A fea- 
ture of computers familiar to programmers is that they 
may sometimes be thrown into a never-ending loop. 
Consider, for example, the instruction "while x > 2, 
divide x by 1" for x initially greater than 2. We can 
see that this algorithm will never halt, without actu- 
ally running it. More interesting from a mathematical 
point of view is an algorithm such as "while x is equal 
to the sum of two primes, add 2 to x, otherwise print 
x and halt", beginning at a; = 8. The algorithm is cer- 
tainly feasible since all pairs of primes less than x can 
be found and added systematically. Will such an algo- 
rithm ever halt? If so, then a counterexample to the 
Goldbach conjecture exists. Using such techniques, a 
vast section of mathematical and physical theory could 
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be reduced to the question "would such and such an 
algorithm halt if we were to run it?" If we could find 
a general way to establish whether or not algorithms 
will halt, we would have an extremely powerful math- 
ematical tool. In a certain sense, it would solve all of 
mathematics! 

Let us suppose that it is possible to find a general algo- 
rithm which will work out whether any Turing machine 
will halt on any input. Such an algorithm solves the 
problem "given x and d[T], would Turing machine T 
halt if it were fed x as input?". Here d[T] is the de- 
scription of T. If such an algorithm exists, then it is 
possible to make a Turing machine Th which halts if 
and only if T{d[T]) does not halt, where d[T] is the 
description of T. Here Th takes as input d[T], which 
is sufficient to tell Th about both the Turing machine 
T and the input to T. Hence we have 

TH{d[T]) halts ^ T{d[T]) does not halt (19) 

So far everything is ok. However, what if we feed Th 
the description of itself, ^[Th-]? Then 

Th {d[TH]) halts ^ Th {d[TH]) does not halt (20) 

which is a contradiction. By this argument Turing 
showed that there is no automatic means to estab- 
lish whether Turing machines will halt in general: the 
"halting problem" is uncomputable. This implies that 
mathematics, and information processing in general, 
is a rich body of different ideas which cannot all be 
summarised in one grand algorithm. This liberating 
observation is closely related to Godel's theorem. 
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4 Quantum verses classical 
physics 

In order to think about quantum information theory, 
let us first state the principles of non-relativisitic quan- 
tum mechanics, as follows (Shankar 1980). 

1. The state of an isolated system Q is represented 
by a vector IV'(O) ^ Hilbert space. 

2. Variables such as position and momentum are 
termed observables and are represented by Her- 
mitian operators. The position and momentum 
operators X, P have the following matrix elements 
in the eigenbasis of X: 

{x\X\x') = x5{x-x') 
{x\P\x') = - a;') 

3. The state vector obeys the Schrodinger equation 

^nJ^\m) = n\m) (21) 

where TL is the quantum Hamiltonian operator. 

4. Measurement postulate. 

The fourth postulate, which has not been made ex- 
plicit, is a subject of some debate, since quite different 
interpretive approaches lead to the same predictions, 
and the concept of 'measurement' is fraught with am- 
biguities in quantum mechanics (Wheeler and Zurek 
1983, Bell 1987, Peres 1993). A statement which is 
valid for most practical purposes is that certain phys- 
ical interactions are recognisably 'measurements', and 
their effect on the state vector jV') is to change it 
to an eigenstate |fc) of the variable being measured, 
the value of k being randomly chosen with probability 
P C!c I (fc IV') p. The change |V') \k) can be expressed 
by the projection operator (|fc) {k\)/ {k |V')- 

Note that according to the above equations, the evo- 
lution of an isolated quantum system is always uni- 
tary, in other words \tp{t)) = U[t) 1-0(0)) where U{t) = 
exp(— i / Tidt/h) is a unitary operator, UU^ = I. This 
is true, but there is a difficulty that there is no such 
thing as a truly isolated system (i.e. one which experi- 
ences no interactions with any other systems), except 



possibly the whole universe. Therefore there is always 
some approximation involved in using the Schrodinger 
equation to describe real systems. 

One way to handle this approximation is to speak of 
the system Q and its environment T. The evolution 
of Q is primarily that given by its Schrodinger equa- 
tion, but the interaction between Q and T has, in part, 
the character of a measurement of Q. This produces a 
non-unitary contribution to the evolution of Q (since 
projections are not unitary), and this ubiquitous phe- 
nomenon is called decoherence. I have underlined these 
elementary ideas because they are central in what fol- 
lows. 

We can now begin to bring together ideas of physics 
and of information processing. For, it is clear that 
much of the wonderful behaviour we see around us in 
Nature could be understood as a form of information 
processing, and conversely our computers are able to 
simulate, by their processing, many of the patterns of 
Nature. The obvious, if somewhat imprecise, questions 
are 

1. "can Nature usefully be regarded as essentially an 
information processor?" 

2. "could a computer simulate the whole of Nature?" 

The principles of quantum mechanics suggest that the 
answer to the first quesion is ye^. For, the state vector 
so central to quantum mechanics is a concept very 
much like those of information science: it is an abstract 
entity which contains exactly all the information about 
the system Q. The word 'exactly' here is a reminder 
that not only is a complete description of Q, it is 
also one that does not contain any extraneous informa- 
tion which can not meaningfully be associated with Q. 
The importance of this in quantum statistics of Fermi 
and Bose gases was mentioned in the introduction. 

The second question can be made more precise by con- 
verting the Church- Turing thesis into a principle of 

''This does not necessarily imply that such language captures 
everthing that can be said about Nature, merely that this is a 
useful abstraction at the descriptive level of physics. I do not 
believe any physical 'laws' could be adequate to completely de- 
scribe human behaviour, for example, since they are sufficiently 
approximate or non-prescriptive to leave us room for manoeuvre 
(Polkinghorne 1994). 
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physics, 

Every finitely realizible physical system can be simu- 
lated arbitrarily closely by a universal model computing 
machine operating by finite means. 

This statement is based on that of Dcutsch (1985). The 
idea is to propose that a principle hke this is not derived 
from quantum mechanics, but rather underpins it, hke 
other principles such as that of conservation of energy. 
The qualifications introduced by 'finitely realizible' and 
'finite means' are important in order to state something 
useful. 

The new version of the Church- Turing thesis (now 
called the 'Church- Turing Principle') docs not refer to 
Turing machines. This is important because there are 
fundamental differences between the very nature of the 
Turing machine and the principles of quantum mechan- 
ics. One is described in terms of operations on classical 
bits, the other in terms of evolution of quantum states. 
Hence there is the possibility that the universal Turing 
machine, and hence all classical computers, might not 
be able to simulate some of the behaviour to be found 
in Nature. Conversely, it may be physically possible 
(i.e. not ruled out by the laws of Nature) to realise a 
new type of computation essentially different from that 
of classical computer science. This is the central aim 
of quantum computing. 



4.1 EPR pctradox, Bell's inequality 

In 1935 Einstein, Podolski and Rosen (EPR) drew 
attention to an important feature of non-relativistic 
quantum mechanics. Their argument, and Bell's anal- 
ysis, can now be recognised as one of the seeds from 
which quantum information theory has grown. The 
EPR paradox should be familiar to any physics gradu- 
ate, and 1 will not repeat the argument in detail. How- 
ever, the main points will provide a useful way in to 
quantum information concepts. 

The EPR thought-experiment can be reduced in 

essence to an experiment involving pairs of two-state 
quantum systems (Bohm 1951, Bohm and Aharonov 
1957). Let us consider a pair of spin- half particles 

A and B, writing the (m^ = +1/2) spin 'up' state 
It) and the (wz = —1/2) spin 'down' state ||). The 



particles are prepared initially in the singlet state 
(IT) li) - li) IT))/\/2, and they subsequently fly apart, 
propagating in opposite directions along the y-axis. Al- 
ice and Bob are widely separated, and they receive par- 
ticle A and B respectively. EPR were concerned with 
whether quantum mechanics provides a complete de- 
scription of the particles, or whether something was 
left out, some property of the spin angular momenta 
Syi,SB which quantum theory failed to describe. Such 
a property has since bcx'onie known as a 'hidden vari- 
able'. They argued that something was left out, be- 
cause this experiment allows one to predict with cer- 
tainty the result of measuring any component of s^, 
without causing any disturbance of B. Therefore all 
the components of Sb have definite values, say EPR, 
and the quantum theory only provides an incomplete 
description. To make the certain prediction without 
disturbing B, one chooses any axis r] along which one 
wishes to know i?'s angular momentum, and then mea- 
sures not B but A, using a Stern- Gerlach apparatus 
aligned along r]. Since the singlet state carries no net 
angular momentum, one can be sure that the corre- 
sponding measurement on B would yield the opposite 
result to the one obtained for A. 

The EPR paper is important because it is carefully ar- 
gued, and the fallacy is hard to unearth. The fallacy 
can be exposed in one of two ways: one can say either 
that Alice's measurement does influence Bob's particle, 
or (which I prefer) that the quantum state vector |^) is 
not an intrinsic property of a quantum system, but an 
expression for the information content of a quantum 
variable. In a singlet state there is mutual informa- 
tion between A and i?, so the information content of 
B changes when we learn something about A. So far 
there is no difference from the behaviour of classical 
information, so nothing surprising has occurred. 

A more thorough analysis of the EPR experiment 
yields a big surprise. This was discovered by Bell 
(1964,1966). Suppose Alice and Bob measure the spin 
component of A and B along different axes riA and 
rjB in the x-z plane. Each measurement yields an an- 
swer -t- or — . Quantum theory and experiment agree 
that the probability for the two measurements to yield 
the same result is sin^(((^^ — (/)b)/2), where (f>A {4>b) 
is the angle between rjA {vb) and the z axis. How- 
ever, there is no way to assign local properties, that 
is properties of A and B independently, which lead to 
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this high a correlation, in which the results are cer- 
tain to be opposite when (j)A = 4>Bi certain to be 
equal when (pA = (I>b + 180°, and also, for example, 
have a sin^(60°) = 3/4 chance of being equal when 
ipA — 4'B = 120°. Feynman (1982) gives a particularly 
clear analysis. At (t>A — <i>B — 120° the highest cor- 
relation which local hidden variables could produce is 
2/3. 

The Bell-EPR argument allows us to identify a task 
which is physically possible, but which no classical 
computer could perform: when repeatedly given in- 
puts 4>A, 4>B at completely separated locations, respond 
quickly (i.e. too quick to allow light-speed communi- 
cation between the locations) with yes/no responses 
which are perfectly correlated when (j)A — 4>b ^ 180°, 
anticorrelated when <f)A = 4'b, and more than ~ 70% 
correlated when (pA ~ (I>b — 120°. 

Experimental tests of Bell's argument were carried out 
in the 1970's and 80's and the quantum theory was ver- 
ified (Clauser and Shimony 1978, Aspect et. al. 1982; 
for more recent work see Aspect (1991), Kwiat et. al. 
1995 and references therein). This was a significant 
new probe into the logical structure of quantum me- 
chanics. The argument can be made even stronger 
by considering a more complicated system. In par- 
ticular, for three spins prepared in a state such as 
(IT) IT) IT) + IT) IT) li))/V2, Greenberger, Home and 
Zcilinger (1989) (GHZ) showed that a single measure- 
ment along a horizontal axis for two particles, and 
along a vertical axis for the third, will yield with cer- 
tainty a result which is the exact opposite of what a 
local hidden- variable theory would predict. A wider 
discussion and references are provided by Greenberger 
et. al. (1990), Mermin (1990). 

The Bell-EPR correlations show that quantum me- 
chanics permits at least one simple task which is be- 
yond the capabilities of classical computers, and they 
hint at a new type of mutual information (Schumacher 
and Nielsen 1996). In order to pursue these ideas, we 
will need to construct a complete theory of quantum 
information. 



5 Quantum Information 



Just as in the discussion of classical information the- 
ory, quantum information ideas are best introduced by 
stating them, and then showing afterwards how they 
link together. Quantum communication is treated in a 
special issue of J. Mod. Opt., volume 41 (1994); reviews 
and references for quantum cryptography are given by 
Bennett et. al. (1992); Hughes et. al. (1995); Phoenix 
and Townsend (1995); Brassard and Crepeau (1996); 
Ekert (1997). Spiller (1996) reviews both communica- 
tion and computing. 



5.1 Qubits 

The elementary unit of quantum information is the 
qubit (Schumacher 1995). A single qubit can be envis- 
aged as a two-state system such as a spin-half or a two- 
level atom (see fig. 12), but when we measure quan- 
tum information in qubits we are really doing some- 
thing more abstract: a quantum system is said to have 
n qubits if it has a Hilbert space of 2" dimensions, 
and so has available 2" mutually orthogonal quantum 
states (recall that n classical bits can represent up to 
2" different things). This definition of the qubit will 



be elaborated in section 5.6 



We will write two orthogonal states of a single 
qubit as {|0),|1)}. More generally, 2" mutually or- 
thogonal states of n qubits can be written 
where i is an rt-bit binary number. For example, 
for three qubits we have {|000) , |001) , |010) , |011) , 
|100),|101),|110),|111)}. 



5.2 Quantum gates 

Simple unitary operations on qubits are called quan- 
tum 'logic gates' (Deutsch 1985, 1989). For example, 
if a qubit evolves as |0) |0), |1) exp(iwt) |1), then 
after time t we may say that the operation, or 'gate' 



Pi0) 



1 

e*^ 



(22) 



has been applied to the qubit, where 6 = ujt. This can 
also be written P(6I) = |0) (0| -|-exp(i6l) |1) (1|. Here are 
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some other elementary quantum gates: 



/ 

X 
Z 
Y 

H 



|0> (0| 

|o> (i| 

P{n) 

xz 
1 



|i> (i| 
|i> (0| 



identity 

NOT 



(|0) + |1))(0| + (|0)-|1))(1| 



(23) 
(24) 
(25) 
(26) 

(27) 



these ah act on a single qubit, and can be achieved by 
the action of some Hamiltonian in Schrodinger's equa- 
tion, since they are all unitary operators^. There are an 
infinite number of singie-qubit quantum gates, in con- 
trast to classical information theory, where only two 
logic gates are possible for a single bit, namely the 
identity and the logical not operation. The quantum 
NOT gate carries |0) to |1) and vice versa, and so is 
analagous to a classical not. This gate is also called 
X since it is the Pauli (Tx operator. Note that the set 
{/, X, Y, Z} is a group under multiplication. 

Of all the possible unitary operators acting on a pair 
of qubits, an interesting subset is those which can be 
written |0) (0|(g)/-|-|l) {1\^U, where / is the single-qubit 
identity operation, and U is some other single-qubit 
gate. Such a two-qubit gate is called a "controlled U" 
gate, since the action / or J7 on the second qubit is 
controlled by whether the first qubit is in the state 
|0) or For example, the effect of controUed-NOT 
("CNOT") is 



|00) 
101) 
|10) 

111) 



100) 
101) 

111) 

|10) 



(28) 



Here the second qubit undergoes a not if and only 
if the first qubit is in the state |1). This list of state 
changes is the analogue of the truth table for a classical 
binary logic gate. The effect of controlled-NOT acting 
on a state \a) \b) can be written a ^ a^h ^ a(Bb, where 
© signifies the exclusive or (xor) operation. For this 
reason, this gate is also called the XOR gate. 

Other logical operations require further qubits. For 
example, the AND operation is achieved by use of the 



*The letter H is adopted for the final gate here because its 
effect is a Hadamard transformation. This is not to be confused 
with the Hamiltonian T-L. 



3-qubit "controUed-controUed-NOT" gate, in which the 
third qubit experiences NOT if and only if both the 
others are in the state |1). This gate is named a Toffoli 
gate, after Toffoli (1980) who showed that the classical 
version is universal for classical reversible computation. 
The effect on a state \a) \h) |0) is a ^ a, 6 ^ 6, — > a-h. 
In other words if the third qubit is prepared in 1 0) then 
this gate computes the and of the first two qubits. 
The use of three qubits is necessary in order to permit 
the whole operation to be unitary, and thus allowed in 
quantum mechanical evolution. 

It is an amusing excercise to find the combinations 
of gates which perform elementary arithmatical op- 
erations such as binary addition and multiplication. 
Many basic constructions are given by Barenco et. al. 
(1995b), further general design considerations are dis- 
cussed by Vedral et. al. (1996) and Beckman et. al. 
(1996). 

The action of a sequence of quantum gates can be writ- 
ten in operator notation, for example Xi772XORi,3 \4>) 
where \4>) is some state of three qubits, and the sub- 
scripts on the operators indicate to which qubits they 
apply. However, once more than a few quantum gates 
are involved, this notation is rather obscure, and can 
usefully be replaced by a diagram known as a quan- 
tum network — see fig. 8. These diagrams will be used 
hereafter. 



5.3 No cloning 



No cloning theorem: An unknown quantum state can- 
not be cloned. 

This states that it is impossible to generate copies of 
a quantum state reliably, unless the state is already 
known (i.e. unless there exists classical information 
which specifies it). Proof: to generate a copy of a 
quantum state |a), we must cause a pair of quantum 
systems to undergo the evolution [/(|a) |0)) = la) |q;) 
where U is the unitary evolution operator. If this is 
to work for any state, then U must not depend on a, 
and therefore U \Q)) = iov ^ \a) . How- 

ever, if we consider the state I7) = (|q;) -I- |/3))/V2, we 
have f/(|7) |0)) = {\a) |a) + |/3) |/3))/V2 ^ I7) I7) so the 
cloning operation fails. This argument applies to any 
purported cloning method (Woofers and Zurek 1982, 
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Dieks 1982). 

Note that any given 'cloning' operation U can work 
on some states {\a) and \(3) in the above example), 
though since U is trace-preserving, two different clon- 
able states must be orthogonal, {a\ p) = 0. Unless 
we already know that the state to be copied is one of 
these states, we cannot guarantee that the chosen U 
will correctly clone it. This is in contrast to classi- 
cal information, where machines like photocopiers can 
easily copy whatever classical information is sent to 
them. The controUed-NOT or XOR operation of equa- 
tion (|8|) is a copying operation for the states |0) and 
|1), but not for states such as |-|-) = (|0) + |l))/\/2 and 
h)^(|0)-|l))/V2. 

The no-cloning theorem and the EPR paradox together 
reveal a rather subtle way in which non-relativistic 
quantum mechanics is a consistent theory. For, if 
cloning were possible, then EPR correlations could be 
used to communicate faster than light, which leads 
to a contradiction (an effect preceding a cause) once 
the principles of special relativity are taken into ac- 
count. To see this, observe that by generating many 
clones, and then measuring them in different bases. 
Bob could deduce unambiguously whether his mem- 
ber of an EPR pair is in a state of the basis {|0) , |1)} 
or of the basis {|-|-) , |— )}. Alice would communicate 
instanteously by forcing the EPR pair into one basis 
or the other through her choice of measurement axis 
(Glauber 1986). 



5.4 Dense coding 

We will discuss the following statement: 

Quantum entanglement is an information resource. 

Qubits can be used to store and transmit classical in- 
formation. To transmit a classical bit string 00101, 
for example, Alice can send 5 qubits prepared in the 
state jOOlOl). The receiver Bob can extract the infor- 
mation by measuring each qubit in the basis {|0) , |1)} 
(i.e. these are the eigenstates of the measured observ- 
able). The measurement results yield the classical bit 
string with no ambiguity. No more than one classical 
bit can be communicated for each qubit sent. 



Suppose now that Alice and Bob are in possession of 
an entangled pair of qubits, in the state |00) -I- |11) 
(we will usually drop normalisation factors such as 
from now on, to keep the notation uncluttered). Al- 
ice and Bob need never have communicated: we imag- 
ine a mechanical central facility generating entangled 
pairs and sending one qubit to each of Alice and Bob, 
who store them (see fig. 9a). In this situation, Al- 
ice can communicate two classical bits by sending Bob 
only one qubit (namely her half of the entangled pair). 
This idea due to Wiesner (Bennett and Wiesner 1992) 
is called "dense coding" , since only one quantum bit 
travels from Alice to Bob in order to convey two clas- 
sical bits. Two quantum bits are involved, but Al- 
ice only ever sees one of them. The method relies on 
the following fact: the four mutually orthogonal states 
|00) + 111) , |00) - 111), |01) + |10) , |01) - |10) can 
be generated from each other by operations on a sin- 
gle qubit. This set of states is called the Bell basis, 
since they exhibit the strongest possible Bell-EPR cor- 
relations (Braunstein et. al. 1992). Starting from 
1 00) + 1 11), Alice can generate any of the Bell basis 
states by operating on her qubit with one of the opera- 
tors {/, A", y, Z}. Since there are four possibilities, her 
choice of operation represents two bits of classical in- 
formation. She then sends her qubit to Bob, who must 
deduce which Bell basis state the qubits are in. This he 
does by operating on the pair with the XOR gate, and 
measuring the target bit, thus distinguishing |00) ± |11) 
from |01) ± 1 10). To find the sign in the superposition, 
he operates with H on the remaining qubit, and mea- 
sures it. Hence Bob obtains two classical bits with no 
ambiguity. 

Dense coding is difficult to implement, and so has no 
practical value merely as a standard communication 
method. However, it can permit secure communica- 
tion: the qubit sent by Alice will only yield the two 
classical information bits to someone in possession of 
the entangled partner qubit. More generally, dense 
coding is an example of the statement which began 
this section. It reveals a relationship between classi- 
cal information, qubits, and the information content of 
quantum entanglement (Barenco and Ekert 1995). A 
laboratory demonstration of the main features is de- 
scribed by Mattle et. al. (1996); Weinfurter (1994) 
and Braunstein and Mann (1995) discuss some of the 
methods employed, based on a source of EPR photon 
pairs from parametric down-conversion. 
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5.5 Quantum teleportation 

It is possible to transmit qubits without sending qubits! 

Suppose Alice wishes to communicate to Bob a single 
qubit in the state \4>). If Alice already knows what 
state she has, for example |0) = |0), she can commu- 
nicate it to Bob by sending just classical information, 
eg "Dear Bob, I have the state |0). Regards, Alice." 
However, if \(f)) is unknown there is no way for Alice 
to learn it with certainty: any measurement she may 
perform may change the state, and she cannot clone it 
and measure the copies. Hence it appears that the only 
way to transmit \(j>) to Bob is to send him the phys- 
ical qubit (i.e. the electron or atom or whatever), or 
possibly to swap the state into another quantum sys- 
tem and send that. In either case a quantum system is 
transmitted. 

Quantum teleportation (Bennett et. al. 1993, Ben- 
nett 1995) permits a way around this limitation. As 
in dense coding, we will use quantum entanglement 
as an information resource. Suppose Alice and Bob 
possess an entangled pair in the state |00) + |11). Al- 
ice wishes to transmit to Bob a qubit in an unknown 
state |(/)). Without loss of generality, we can write 
|(/)) = a|0) -|- 6|1) where a and b are unknown coef- 
ficients. Then the initial state of all three qubits is 

a|000) -h6|100) +a|011) +6|111) (29) 

Alice now measures in the Bell basis the first two 
qubits, i.e. the unknown one and her member of the en- 
tangled pair. The network to do this is shown in fig. 9b. 
After Alice has applied the XOR and Hadamard gates, 
and just before she measures her qubits, the state is 

|00) (a|0)-|-6|l)) + |01> (a|l)+6|0)) 
+ |10)(a|0)-6|l)) + |ll)(a|l)-6|0)). (30) 

Alice's measurements collapse the state onto one of four 
different possibilities, and yield two classical bits. The 
two bits are sent to Bob, who uses them to learn which 
of the operators {/, X, Z, Y} he must apply to his qubit 
in order to place it in the state a|0) -I- &|1) = |(/)). 
Thus Bob ends up with the qubit (i.e. the quantum 
information, not the actual quantum system) which 
Alice wished to transmit. 

Note that the quantum information can only arrive at 
Bob if it disappears from Alice (no cloning). Also, 



quantum information is complete information: \(j)) is 
the complete description of Alice's qubit. The use of 
the word 'teleportation' draws attention to these two 
facts. Teleportation becomes an especially important 
idea when we come to consider communication in the 
presence of noise, section ^. 

5.6 Quantum data compression 

Having introduced the qubit, we now wish to show 
that it is a useful measure of quantum information con- 
tent. The proof of this is due to Jozsa and Schumacher 
(1994) and Schumacher (1995), building on work of 
Kholevo (1973) and Levitin (1987). To begin the ar- 
gument, we first need a quantity which expresses how 
much information you would gain if you were to learn 
the quantum state of some system Q. A suitable quan- 
tity is the Von Neumann entropy 

S{p) = -Trplogp (31) 

where Tr is the trace operation, and p is the density 
operator describing an ensemble of states of the quan- 
tum system. This is to be compared with the classi- 
cal Shannon entropy, equation (|^). Suppose a classi- 
cal random variable X has a probability distribution 
p{x). If a quantum system is prepared in a state |a;) 
dictated by the value of X, then the density matrix 
is '^rcP{x)\x) {x\, where the states \x) need not be 
orthogonal. It can be shown (Kholevo 1973, Levitin 
1987) that S{p) is an upper limit on the classical mu- 
tual information I{X : Y) between X and the result Y 
of a measurement on the system. 

To make connection with qubits, we consider the re- 
sources needed to store or transmit the state of a quan- 
tum system q of density matrix p. The idea is to collect 
71 ^ 1 such systems, and transfer ('encode') the joint 
state into some smaller system. The smaller system 
is transmitted down the channel, and at the receiving 
end the joint state is 'decoded' into n systems q' of the 
same type as q (see fig. 9c). The final density matrix of 
each q' is p', and the whole process is deemed success- 
ful if p' is sufficiently close to p. The measure of the 
similarity between two density matrices is the fidelity 
defined by 

/(p,p')= (TrVp^TW^)' (32) 
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This can be interpreted as the probabiUty that <^ passes 
a test which ascertained if it was in the state p. When 
p and p' are both pure states, |0) (0| and the 
fidehty is none other than the famihar overlap: / = 



Our aim is to find the smaUest transmitted system 
which permits / = 1 — e for e 1. The argument is 
analogous to the 'typical sequences' idea used in section 



2 .21 . Restricting ourselves for simplicity to two-state 
systems, the total state of n systems is represented by 
a vector in a Hilbert space of 2" dimensions. However, 
if the von Neumann entropy 5'(jo) < 1 then it is highly 
likely (i.e. tends to certainty in the limit of large n) 
that, in any given realisation, the state vector actually 
falls in a typical sub-space of Hilbert space. Schumacher 
and Jozsa showed that the dimension of the typical sub- 
space is 2"'^'^''^ Hence only nS{p) qubits are required 
to represent the quantum information faithfully, and 
the qubit (i.e. the logarithm of the dimensionality of 
Hilbert space) is a useful measure of quantum informa- 
tion. Furthermore, the encoding and decoding opera- 
tion is 'blind': it does not depend on knowledge of the 
exact states being transmitted. 

Schumacher and Josza's result is powerful because it 
is general: no assumptions are made about the exact 
nature of the quantum states involved. In particular, 
they need not be orthogonal. If the states to be trans- 
mitted were mutually orthogonal, the whole problem 
would reduce to one of classical information. 

The 'encoding' and 'decoding' required to achieve such 
quantum data compression and decompression is tech- 
nologically very demanding. It cannot at present be 
done at all using photons. However, it is the ultimate 
compression allowed by the laws of physics. The details 
of the required quantum networks have been deduced 
by Cleve and DiVincenzo (1996). 

As well as the essential concept of information, other 
classical ideas such as Huffman coding have their quan- 
tum counterparts. Furthermore, Schumacher and Niel- 
son (1996) derive a quantity which they call 'coherent 
information' which is a measure of mutual informa- 
tion for quantum systems. It includes that part of the 
mutual information between entangled systems which 
cannot be accounted for classically. This is a helpful 
way to understand the Bell-EPR correlations. 



5.7 Quantum cryptography 

No overview of quantum information is complete with- 
out a mention of quantum cryptography. This area 
stems from an unpublished paper of Wiesner written 
around 1970 (Wiesner 1983). It includes various ideas 
whereby the properties of quantum systems are used to 
achieve useful cryptographic tasks, such as secure (i.e. 
secret) communication. The subject may be divided 
into quantum key distribution, and a collection of other 
ideas broadly related to bit commitment. Quantum 
key distribution will be outlined below. Bit commit- 
ment refers to the scenario in which Alice must make 
some decision, such as a vote, in such a way that Bob 
can be sure that Alice fixed her vote before a given 
time, but where Bob can only learn Alice's vote at some 
later time which she chooses. A classical, cumbersome 
method to achieve bit commitment is for Alice to write 
down her vote and place it in a safe which she gives to 
Bob. When she wishes Bob, later, to learn the infor- 
mation, she gives him the key to the safe. A typical 
quantum protocol is a carefully constructed variation 
on the idea that Alice provides Bob with a prepared 
qubit, and only later tells him in what basis it was 
prepared. 

The early contributions to the field of quantum cryp- 
tography were listed in the introduction, further refer- 
ences may be found in the reviews mentioned at the be- 
ginning of this section. Cryptography has the unusual 
feature that it is not possible to prove by experiment 
that a cryptographic procedure is secure: who knows 
whether a spy or cheating person managed to beat the 
system? Instead, the users' confidence in the methods 
must rely on mathematical proofs of security, and it 
is here that much important work has been done. A 
concerted effort has enabled proofs to be established 
for the security of correctly implemented quantum key 
distribution. However, the bit commitment idea, long 
thought to be secure through quantum methods, was 
recently proved to be insecure (Mayers 1997, Lo and 
Chau 1997) because the participants can cheat by mak- 
ing use of quantum entanglement. 

Quantum key distribution is a method in which quan- 
tum states are used to establish a random secret key for 
cryptography. The essential ideas are as follows: Alice 
and Bob are, as usual, widely seperated and wish to 
communicate. Alice sends to Bob 2n qubits, each pre- 
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pared in one of the states |0) , |1) , |+) , |— ), randomly 
chosen^. Bob measures his received bits, choosing the 
measurement basis randomly between {|0),|1)} and 
{|+),|— )}. Next, Alice and Bob inform each other 
publicly (i.e. anyone can listen in) of the basis they 
used to prepare or measure each qubit. They find out 
on which occasions they by chance used the same basis, 
which happens on average half the time, and retain just 
those results. In the absence of errors or interference, 
they now share the same random string of n classical 
bits (they agree for example to associate |0) and |+) 
with 0; |1) and |— ) with 1). This classical bit string is 
often called the raw quantum transmission, RQT. 

So far nothing has been gained by using qubits. The 
important feature is, however, that it is impossible for 
anyone to learn Bob's measurement results by observ- 
ing the qubits en route, without leaving evidence of 
their presence. The crudest way for an eavesdopper 
Eve to attempt to discover the key would be for her 
to intercept the qubits and measure them, then pass 
them on to Bob. On average half the time Eve guesses 
Alice's basis correctly and thus does not disturb the 
qubit. However, Eve's correct guesses do not coincide 
with Bob's, so Eve learns the state of half of the n 
qubits which Alice and Bob later decide to trust, and 
disturbs the other half, for example sending to Bob 1+) 
for Alice's |0) . Half of those disturbed will be projected 
by Bob's measurement back onto the original state sent 
by Alice, so overall Eve corrupts n/4 bits of the RQT. 

Alice and Bob can now detect Eve's presence simply by 
randomly choosing n/2 bits of the RQT and announc- 
ing publicly the values they have. If they agree on all 
these bits, then they can trust that no eavesdropper 
was present, since the probability that Eve was present 
and they happened to choose n/2 uncorrupted bits is 
(3/4)"/2 ~ 10-125 for n = 1000. The n/2 undisclosed 
bits form the secret key. 

In practice the protocol is more complicated since Eve 
might adopt other strategies (e.g. not intercept all the 
qubits), and noise will currupt some of the qubits even 
in the absence of an evesdropper. Instead of reject- 
ing the key if many of the disclosed bits differ, Alice 
and Bob retain it as long as they find the error rate 
to be well below 25%. They then process the key in 

^Many other methods are possible, we adopt this one merely 
to illustrate the concepts. 



two steps. The first is to detect and remove errors, 
which is done by publicly comparing parity checks on 
publicly chosen random subsets of the bits, while dis- 
carding bits to prevent increasing Eve's information. 
The second step is to decrease Eve's knowledge of the 
key, by distilling from it a smaller key, composed of 
parity values calculated from the original key. In this 
way a key of around n/4 bits is obtained, of which Eve 
probably knows less than 10^^ of one bit (Bennett et. 
al. 1992). 

The protocol just described is not the only one possible. 
Another approach (Ekert 1991) involves the use of EPR 
pairs, which Alice and Bob measure along one of three 
different axes. To rule out eavesdropping they check 
for Bell-EPR correlations in their results. 

The great thing about quantum key distribution is 
that it is feasible with current technology. A pioneer- 
ing experiment (Bennett and Brassard 1989) demon- 
strated the principle, and much progress has been made 
since then. Hughes et. al. (1995) and Phoenix and 
Townsend (1995) summarised the state of affairs two 
years ago, and recently Zbinden et. al. (1997) have 
reported excellent key distribution through 23 km of 
standard telecom fibre under lake Geneva. The qubits 
are stored in the polarisation states of laser pulses, i.e. 
coherent states of light, with on average 0.1 photons 
per pulse. This low light level is necessary so that 
pulses containing more than one photon are unlikely. 
Such pulses would provide duplicate qubits, and hence 
a means for an evesdropper to go undetected. The sys- 
tem achieves a bit error rate of 1.35%, which is low 
enough to guarantee privacy in the full protocol. The 
data transmission rate is rather low: MHz as opposed 
to the GHz rates common in classical communications, 
but the system is very reliable. 

Such spectacular experimental mastery is in contrast 
to the subject of the next section. 

6 The universal quantum com- 
puter 

We now have sufficient concepts to understand the 
jewel at the heart of quantum information theory. 
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namely, the quantum computer (QC). Ekert and Jozsa 
(1996) and Barenco (1996) give introductory reviews 
concentrating on the quantum computer and factori- 
sation; a review with emphasis on practicahties is pro- 
vided by Spiller (1996). Introductory material is also 
provided by DiVincenzo (1995b) and Shor (1996). 

The QC is first and foremost a machine which is a 
theoretical construct, like a thought-experiment, whose 
purpose is to allow quantum information processing to 
be formally analysed. In particular it establishes the 
Church- Turing Principle introduced in section ^. 

Here is a prescription for a quantum computer, based 
on that of Deutsch (1985, 1989): 

A quantum computer is a set of n qubits in which the 
following operations are experimentally feasible: 

1. Each qubit can be prepared in some known state 
|0). 

2. Each qubit can be measured in the basis {|0) , |1)}. 

3. A universal quantum gate (or set of gates) can 
be applied at will to any fixed-size subset of the 
qubits. 

4. The qubits do not evolve other than via the above 
transformations. 

This prescription is incomplete in certain technical 
ways to be discussed, but it encompasses the main 
ideas. The model of computation we have in mind is 
a network model, in which logic gates are applied se- 
quentially to a set of bits (here, quantum bits). In an 
electronic classical computer, logic gates are spread out 
in space on a circuit board, but in the QC we typically 
imagine the logic gates to be interactions turned on and 
off in time, with the qubits at fixed positions, as in a 
quantum network diagram (fig. 8, 12). Other models 
of quantum computation can be conceived, such as a 
cellular automaton model (Margolus 1990). 



6.1 Universal gate 

The universal quantum gate is the quantum equivalent 
of the classical universal gate, namely a gate which 



by its repeated use on different combinations of bits 
can generate the action of any other gate. What is 
the set of all possible quantum gates, however? To 
answer this, we appeal to the principles of quantum 
mechanics (Schrodinger's equation), and answer that 
since all quantum evolution is unitary, it is sufficient 
to be able to generate all unitary transformations of 
the n qubits in the computer. This might seem a tall 
order, since we have a continuous and therefore infinite 
set. However, it turns out that quite simple quantum 
gates can be universal, as Deutsch showed in 1985. 

The simplest way to think about universal gates is to 
consider the pair of gates V{6, (p) and controUed-not 
(or xor), where V{6, <j)) is a general rotation of a single 
qubit, ie 

^(^'<^) - -^e^4'sin{^/2) cos{0/2) ) ' ^^^> 

It can be shown that any n x n unitary matrix can 
be formed by composing 2-qubit xOR gates and single- 
qubit rotations. Therefore, this pair of operations is 
universal for quantum computation. A purist may ar- 
gue that V{9, (p) is an infinite set of gates since the 
parameters and (j) are continuous, but it suffices to 
choose two particular irrational angles for and 0, 
and the resulting single gate can generate all single- 
qubit rotations by repeated application; however, a 
practical system need not use such laborious methods. 
The XOR and rotation operations can be combined to 
make a controlled rotation which is a single univer- 
sal gate. Such universal quantum gates were discussed 
by Deutsch et. al. (1995), Lloyd (1995), DiVincenzo 
(1995a) and Barenco (1995). 

It is remarkable that 2-qubit gates are sufficient for 
quantum computation. This is why the quantum gate 
is a powerful and important concept. 



6.2 Church- Turing principle 

Having presented the QC, it is necessary to argue for 
its universality, i.e. that it fulfills the Church- Turing 
Principle as claimed. The two-step argument is very 
simple. First, the state of any finite quantum system 
is simply a vector in Hilbert space, and therefore can be 
represented to arbitrary precision by a finite number of 
qubits. Secondly, the evolution of any finite quantum 
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system is a unitary transformation of the state, and 
therefore can be simulated on the QC, which can gen- 
erate any unitary transformation with arbitrary preci- 
sion. 

A point of principle is raised by Myers (1997), who 
points out that there is a difhculty with computational 
tasks for which the number of steps for completion can- 
not be predicted. We cannot in general observe the QC 
to find out if it has halted, in contrast to a classical 
computer. However, we will only be concerned with 
tasks where either the number of steps is predictable, 
or the QC can signal completion by setting a dedicated 
qubit which is otherwise not involved in the compu- 
tation (Deutsch 1985). This is a very broad class of 
problems. Nielsen and Chuang (1997) consider the use 
of a fixed quantum gate array, showing that there is 
no array which, operating on qubits representing both 
data and program, can perform any unitary transfor- 
mation on the data. However, we consider a machine 
in which a classical computer controls the quantum 
gates applied to a quantum register, so any gate array 
can be 'ordered' by a classical program to the classical 
computer. 

The QC is certainly an interesting theoretical tool. 
However, there hangs over it a large and important 
question-mark: what about imperfection? The pre- 
scription given above is written as if measurements and 
gates can be applied with arbitrary precision, which is 
unphysical, as is the fourth requirement (no extraneous 
evolution). The prescription can be made realistic by 
attaching to each of the four requirements a statement 
about the degree of allowable imprecision. This is a 
subject of on-going research, and we will take it up in 
section ^. Meanwhile, let us investigate more specifi- 
cally what a sufficiently well-made quantum computer 
might do. 



7 Quantum algorithms 

It is well known that classical computers are able to cal- 
culate the behaviour of quantum systems, so we have 
not yet demonstrated that a quantum computer can do 
anything which a classical computer can not. Indeed, 
since our theories of physics always involve equations 
which we can write down and manipulate, it seems 



highly unlikely that quantum mechanics, or any future 
physical theory, would permit computational problems 
to be addressed which are not in principle solvable on a 
large enough classical Turing machine. However, as we 
saw in section |3^ , those words 'large enough', and also 
'fast enough', are centrally important in computer sci- 
ence. Problems which are computationally 'hard' can 
be impossible in practice. In technical language, while 
quantum computing does not enlarge the set of compu- 
tational problems which can be addressed (compared 
to classical computing), it does introduce the possibil- 
ity of new complexity classes. Put more simply, tasks 
for which classical computers are too slow may be solv- 
able with quantum computers. 



7.1 Simulation of physical systems 

The first and most obvious application of a QC is that 
of simulating some other quantum system. To simulate 
a state vector in a 2"-dimensional Hilbert space, a clas- 
sical computer needs to manipulate vectors containing 
of order 2" complex numbers, whereas a quantum com- 
puter requires just n qubits, making it much more effi- 
cient in storage space. To simulate evolution, in general 
both the classical and quantum computers will be inef- 
ficient. A classical computer must manipulate matrices 
containing of order 2^" elements, which requires a num- 
ber of operations (multiplication, addition) exponen- 
tially large in n, while a quantum computer must build 
unitary operations in 2"-dimensional Hilbert space, 
which usually requires an exponentially large num- 
ber of elementary quantum logic gates. Therefore the 
quantum computer is not guaranteed to simulate every 
physical system efficiently. However, it can be shown 
that it can simulate a large class of quantum systems 
efficiently, including many for which there is no effi- 
cient classical algorithm, such as many-body systems 
with local interactions (Lloyd 1996, Zalka 1996, Wies- 
ner 1996, Meyer 1996, Lidar and Biam 1996, Abrams 
and Lloyd 1997, Boghosian and Taylor 1997). 



7.2 Period finding and Shor's factorisa- 
tion algorithm 

So far we have discussed simulation of Nature, which is 
a rather restricted type of computation. We would like 
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to let the QC loose on more general problems, but it 
has so far proved hard to find ones on which it performs 
better than classical computers. However, the fact that 
there exist such problems at all is a profound insight 
into physics, and has stimulated much of the recent 
interest in the field. 

Currently one of the most important quantum algo- 
rithms is that for finding the period of a function. 
Suppose a function f{x) is periodic with period r, i.e. 
fix) = f{x + r). Suppose further that f{x) can be 
efficiently computed from x, and all we know initially 
is that N/2 < r < N for some N. Assuming there is 
no analytic technique to deduce the period of f{x), the 
best we can do on a classical computer is to calculate 
f{x) for of order N/2 values of x, and find out when 
the function repeats itself (for well-behaved functions 
only 0{^/N) values may be needed on average). This 
is inefficient since the number of operations is exponen- 
tial in the input size log N (the information required 
to specify N). 

The task can be solved efficiently on a QC by the el- 
egant method shown in fig. 10, due to Shor (1994), 
building on Simon (1994). The QC requires 2n qubits, 
plus a further 0(n) for workspace, where n = [2 log TV] 
(the notation \x~\ means the nearest integer greater 
than x). These are divided into two 'registers', each 
of n qubits. They will be referred to as the x and y 
registers; both are initially prepared in the state |0) 
(i.e. all n qubits in states |0)). Next, the operation H 
is applied to each qubit in the x register, making the 
total state 



-J w — 1 
'W 



(34) 



where w = 2". This operation is referred to as a 
Fourier transform in fig. 10, for reasons that will 
shortly become apparant. The notation \x) means a 
state such as |0011010), where 0011010 is the integer x 
in binary notation. In this context the basis {|0) , |1)} 
is referred to as the 'computational basis.' It is conve- 
nient (though not of course necessary) to use this basis 
when describing the computer. 

Next, a network of logic gates is applied to both x and 
y regisiters, to perform the transformation Uf \x) |0) = 
\x) \f{x))- Note that this transformation can be uni- 
tary because the input state |a;) |0) is in one to one 



correspondance with the output state \x) |/(a;)), so the 
process is reversible. Now, applying U f to the state 
given in eq. (p4), we obtain 



w — 1 



(35) 



This state is illustrated in fig. 11a. At this point some- 
thing rather wonderful has taken place: the value of 
](x) has been calculated for w — 2" values of a;, all in 
one go! This feature is referred to as quantum paral- 
lelism and represents a huge parallelism because of the 
exponential dependence on n (imagine having 2^'^*', i.e. 
a million times Avagadro's number, of classical proces- 
sors!) 

Although the 2" evaluations of f{x) are in some sense 
'present' in the quantum state in eq. (^5|), unfortu- 
nately we cannot gain direct access to them. For, a 
measurement (in the computational basis) of the y reg- 
ister, which is the next step in the algorithm, will only 
reveal one value of /(x)^. Suppose the value obtained 
is f{x) = u. The y register state collapses onto 
and the total state becomes 



1 



M ^ 



\du+ir) \u) 



(36) 



where du + jr, for j = 0, 1, 2 ... M — 1, are all the 
values of x for which f{x) = u. In other words the 
periodicity of f{x) means that the x register remains 
in a superposition oi M w/r states, at values of x 
separated by the period r. Note that the offset du of 
the set of x values depends on the value u obtained in 
the measurement of the y register. 

It now remains to extract the periodicity of the state 
in the x register. This is done by applying a Fourier 
transform, and then measuring the state. The discrete 
Fourier transform employed is the following unitary 
process: 



w — 1 

U:Fr\x) ^ e*2-'=^/«' |fc) 

/ill ^ — ^ 



(37) 



fc=0 



Note that eq. ( p4[ ) is an example of this, operating on 
the initial state |0). The quantum network to apply 

^It is not strictly necessary to measure the y register, but this 
simplifies the description. 
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C/jFT is based on the fast Fourier transform algorithm 
(see, e.g., Knuth (1981)). The quantum version was 
worked out by Coppersmith (1994) and Deutsch (1994) 
independently, a clear presentation may also be found 
in Ekert and Josza (1996), Barenco (1996)[|. Before 
applying U -pT to eq. ( |36| ) we will make the simplifying 
assumption that r divides w exactly, so A/ = w jr. The 
essential ideas are not affected by this restriction; when 
it is relaxed some added complications must be taken 
into account (Shor 1994, 1995a; Ekert and Josza 1996). 

The y register no longer concerns us, so we will just 
consider the x state from eq. (|36[): 



where 



' w jr 



w /r—l 
J=0 



^J2f{k)\k) (38) 



, r/i w f 1 if fc is a multiple of w/r 

1^(^)1 = I otherwise ^^^^ 

This state is illustrated in fig. lib. The final state of 
the X register is now measured, and we see that the 
value obtained must be a multiple of w/r. It remains 
to deduce r from this. We have x — Xw/r where A 
is unknown. If A and r have no common factors, then 
we cancel x/w down to an irreducible fraction and thus 
obtain A and r. If A and r have a common factor, which 
is unlikely for large r, then the algorithm fails. In this 
case, the whole algorithm must be repeated from the 
start. After a number of repetitions no greater than 
logr, and usually much less than this, the probability 
of success can be shown to be arbitrarily close to 1 
(Ekert and Josza 1996). 

The quantum period-finding algorithm we have de- 
scribed is efficient as long as J7/, the evaluation of f{x), 
is efficient. The total number of elementary logic gates 
required is a polynomial rather than ex pon ential func- 
tion of n. As was emphasised in section 3^, this makes 
all the difference between tractable and intractable in 
practice, for sufficiently large n. 

To add the icing on the cake, it can be remarked that 
the important factorisation problem mentioned in sec- 
tion |3.2| can be reduced to one of finding the period of 

'^An exact quantum Fourier transform would require rotation 
operations of precision exponential in n, which raises a problem 
with the efficiency of Shor's algorithm. However, an approximate 
version of the Fourier transform is sufficient (Barenco et. al. 
1996) 



a simple function. This and all the above ingredients 
were first brought together by Shor (1994), who thus 
showed that the factorisation problem is tractable on 
an ideal quantum computer. The function to be eval- 
uated in this case is f{x) = mod N where N is 
the number to be factorised, and a < A'' is chosen ran- 
domly. One can show using elementary number theory 
(Ekert and Josza 1996) that for most choices of a, the 
period r is even and a''/^ ± 1 shares a common factor 
with A^. The common factor (which is of course a fac- 
tor A^) can then be deduced rapidly using a classical 
algorithm due to Euclid {circa 300 BC; see, e.g. Hardy 
and Wright 1965). 

To evaluate /(x) efficiently, repeated squaring (modulo 
A^) is used, giving powers ((a^)^)^ . . .. Selected such 
powers of a, corresponding to the binary expansion of 
a, are then multiplied together. Complete networks 
for the whole of Shor's algorithm were described by 
Miquel et. al. (1996), Vedral et. al. (1996) and Beck- 
man et. al. (1996). They require of order 300(logAf)'^ 
logic gates. Therefore, to factorise numbers of order 
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130 



i.e. at the limit of current classical methods. 



would require 2 x 10^° gates per run, or 7 hours if 
the 'switching rate' is one megaHertz^. Considering 
how difficult it is to make a quantum computer, this 
offers no advantage over classical computation. How- 
ever, if we double the number of digits to 260 then 



the problem is intractable classically (see section 3.2), 
while the ideal quantum computer takes just 8 times 
longer than before. The existence of such a powerful 
method is an exciting and profound new insight into 
quantum theory. 

The period-finding algorithm appears at first sight like 
a conjuring trick: it is not quite clear how the quan- 
tum computer managed to produce the period like a 
rabbit out of a hat. Examining fig. 11 and equations 
(^J) to (|38|), I would say that the most important fea- 
tures are contained in eq. (|35|). They are not only the 
quantum parallelism already mentioned, but also quan- 
tum entanglement, and, finally, quantum interference. 
Each value of f{x) retains a link with the value of x 
which produced it, through the entanglement of the x 
and y registers in eq. (^5|). The 'magic' happens when 
a measurement of the y register produces the special 

*The algorithm might need to be run log r ~ 60 times to 
ensure at least one successful run, but the average number of 
runs required will be much less than this. 
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state \il>) (eq. |3^ ) in the x register, and it is quan- 
tum entanglement which permits this (see also Jozsa 
1997a). The final Fourier transform can be regarded as 
an interference between the various superposed states 
in the x register (compare with the action of a diffrac- 
tion grating). 

Interference effects can be used for computational pur- 
poses with classical light fields, or water waves for that 
matter, so interference is not in itself the essentially 
quantum feature. Rather, the exponentially large num- 
ber of interfering states, and the entanglement, are fea- 
tures which do not arise in classical systems. 



7.3 Grover's search algorithm 

Despite considerable efforts in the quantum computing 
community, the number of useful quantum algorithms 
which have been discovered remains small. They con- 
sist mainly of variants on the period-finding algorithm 
presented above, and another quite different task: that 
of searching an unstructured list. Grover (1997) pre- 
sented a quantum algorithm for the following problem: 
given an unstructured list of items {x^}, find a partic- 
ular item Xj = t. Think, for example, of looking for a 
particular telephone number in the telephone directory 
(for someone whose name you do not know). It is not 
hard to prove that classical algorithms can do no better 
than searching through the list, requiring on average 
N/2 steps, for a list of N items. Grover's algorithm 
requires of order y/N steps. The task remains compu- 
tationally hard: it is not transferred to a new complex- 
ity class, but it is remarkable that such a seemingly 
hopeless task can be speeded up at all. The 'quan- 
tum speed-up' '-^ ^/N /2 is greater than that achieved 
by Shor's factorisation algorithm (~ exp(2(ln7V)^/^)), 
and would be important for the huge sets (A'' ~ 10^^) 
which can arise, for example, in code-breaking prob- 
lems (Brassard 1997). 

An important further point was proved by Bennett et. 
al. (1997), namely that Grover's algorithm is optimal: 
no quantum algorithm can do better than 0{^/N). 

A brief sketch of Grover's algorithm is as follows. Each 
item has a label i, and we must be able to test in a 
unitary way whether any item is the one we are seeking. 
In other words there must exist a unitary operator S 



such that S \i) = \i) if i ^ j, and S \ j) = — \ where 
j is the label of the special item. For example, the 
test might establish whether i is the solution of some 
hard computational problem^. The method begins by 
placing a single quantum register in a superposition 
of all computational states, as in the period-finding 
algorithm (eq. (|3^)). Define 

|vl/(^))^sin0|j) + -^^|z) (40) 

where j is the label of the element t — Xj to be found. 
The initially prepared state is an equally-weighted su- 
perposition, \"^{9o)) where sin^o — 1/VN- Now apply 
S, which reverses the sign of the one special element of 
the superposition, then Fourier transform, change the 
sign of all components except |0), and Fourier trans- 
form back again. These operations represent a subtle 
interference effect which achieves the following trans- 
formation: 

Ug\0) = \^ {6 + <!>)) (41) 

where sin (j> — — 1/N. The coefficient of the spe- 

cial element is now slightly larger than that of all the 
other elements. The method proceeds simply by apply- 
ing Ug m times, where m ~ (7r/4)\/iV- The slow rota- 
tion brings 9 very close to 7r/2, so the quantum state 
becomes almost precisely equal to After the m it- 
erations the state is measured and the value j obtained 
(with error probability 0{1/N)). If Ug is applied too 
many times, the success probability diminishes, so it is 
important to know m, which was deduced by Boyer et. 
al. (1996). Kristen Fuchs compares the technique to 
cooking a souffle. The state is placed in the 'quantum 
oven' and the desired answer rises slowly. You must 
open the oven at the right time, neither too soon not 
too late, to guarantee success. Otherwise the souffle 
will fall — the state collapses to the wrong answer. 

The two algorithms I have presented are the easiest to 
describe, and illustrate many of the methods of quan- 
tum computation. However, just what further methods 
may exist is an open question. Kitaev (1996) has shown 
how to solve the factorisation and related problems us- 
ing a technique fundamentally different from Shor's. 
His ideas have some similarities to Grover's. Kitaev's 
method is helpfully clarified by Jozsa (1997b) who also 

®That is, an "np" problem for which finding a solution is hard, 
but testing a proposed solution is easy. 
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brings out the common features of several quantum al- 
gorithms based on Fourier transforms. The quantum 
programmer's toolbox is thus slowly growing. It seems 
safe to predict, however, that the class of problems for 
which quantum computers out-perform classical ones is 
a special and therefore small class. On the other hand, 
any problem for which finding solutions is hard, but 
testing a candidate solution is easy, can at last resort 
be solved by an exhaustive search, and here Grover's 
algorithm may prove very useful. 
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8 Experimental quantum infor- 
mation processors 

The most elementary quantum logical operations have 
been demonstrated in many physics experiments dur- 
ing the past 50 years. For example, the not operation 
(X) is no more than a stimulated transition between 
two energy levels |0) and |1). The important XOR op- 
eration can also be identified as a driven transition in a 
four-level system. However, if we wish to contemplate 
a quantum computer it is necessary to find a system 
which is sufficiently controllable to allow quantum logic 
gates to be applied at will, and yet is sufficiently com- 
plicated to store many qubits of quantum information. 

It is very hard to find such systems. One might hope to 
fabricate quantum devices on solid state microchips — 
this is the logical progression of the microfabrication 
techniques which have allowed classical computers to 
become so powerful. However, quantum computation 
relies on complicated interference effects and the great 
problem in realising it is the problem of noise. No 
quantum system is really isolated, and the coupling to 
the environment produces decoherence which destroys 
the quantum computation. In solid state devices the 
environment is the substrate, and the coupling to this 
environment is strong, producing typical decoherence 
times of the order of picoseconds. It is important to re- 
alise that it is not enough to have two different states 
|0) and |1) which are themselves stable (for example 
states of different current in a superconductor): we re- 
quire also that superpositions such as |0) -|- |1) preserve 
their phase, and this is typically where the decoherence 
timescale is so short. 

At present there are two candidate systems which 
should permit quantum computation on 10 to 40 
qubits. These are the proposal of Cirac and Zoller 
(1995) using a line of singly charged atoms confined 
and cooled in vacuum in an ion trap, and the pro- 
posal of Gershenfeld and Chuang (1997), and simulta- 
neously Cory ei. al. (1996), using the methods of bulk 
nuclear magnetic resonance. In both cases the propos- 
als rely on the impressive efforts of a large commu- 
nity of researchers which developed the experimental 
techniques. Previous proposals for experimental quan- 
tum computation (Lloyd 1993, Herman et. al. 1994, 
Barenco et. al. 1995a, DiVincenzo 1995b) touched on 



some of the important methods but were not experi- 
mentally feasible. Further recent proposals (Privman 
et. al. 1997, Loss and DiVincenzo 1997) may become 
feasible in the near future. 



8.1 Ion trap 

The ion trap method is illustrated in fig. 12, and de- 
scribed in detail by Steane (1997b). A string of ions is 
confined by a combination of oscillating and static elec- 
tric fields in a linear 'Paul trap' in high vacuum (10~^ 
Pa). A single laser beam is split by beam splitters and 
acousto-optic modulators into many beam pairs, one 
pair illuminating each ion. Each ion has two long-lived 
states, for example different levels of the ground state 
hyperfine structure (the lifetime of such states against 
spontaneous decay can exceed thousands of years) . Let 
us refer to these two states as \g) and |e); they are or- 
thogonal and so together represent one qubit. Each 
laser beam pair can drive coherent Raman transitions 
between the internal states of the relevant ion. This 
allows any single-qubit quantum gate to be applied to 
any ion, but not two-qubit gates. The latter requires 
an interaction between ions, and this is provided by 
their Coulomb repulsion. However, exactly how to use 
this interaction is far from obvious; it required the im- 
portant insight of Cirac and Zoller. 

Light carries not only energy but also momentum, so 
whenever a laser beam pair interacts with an ion, it 
exchanges momentum with the ion. In fact, the mu- 
tual repulsion of the ions means that the whole string 
of ions moves en masse when the motion is quantised 
(Mossbauer effect). The motion of the ion string is 
quantised because the ion string is confined in the po- 
tential provided by the Paul trap. The quantum states 
of motion correspond to the different degrees of exci- 
tation ('phonons') of the normal modes of vibration 
of the string. In particular we focus on the ground 
state of the motion |n = 0) and the lowest excited 
state \n = 1) of the fundamental mode. To achieve, 
for example, controUed-Z between ion x and ion y, we 
start with the motion in the ground state |n = 0). A 
pulse of the laser beams on ion x drives the transition 
\n = 0) - |n = 0) \g)^, \n = 0) |e), |n = 1) \g)^, 
so the ion finishes in the ground state, and the motion 
finishes in the initial state of the ion: this is a 'swap' 
operation. Next a pulse of the laser beams on ion y 
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drives the transition 

I" = 0) \g)y 
1^ = 0) \e)y 
\n^l) \g)y 
\n = 1) |e)„ 



l« = 0> \9)y 
W = 0) |e)^ 
\n^l)\g)y 
-\n = l)\e) 



Finally, we repeat the initial pulse on ion x. The overall 
effect of the three pulses is 



\n = 0)\g)j9)y 
1^ = 0) Iff) Je)^ 
\n = 0) |e)^ \g)y 
l^ = 0)|e)Je)^ 



W = 0) \g)y 
\n = 0) \g)^ \e)y 
I^ = 0)|e)j5), 
-|^ = 0)|e)JeK 



which is exactly a controUed-Z between x and y. Each 
laser pulse must have a precisely controlled frequency 
and duration. The controUed-Z gate and the single- 
qubit gates together provide a universal set, so we can 
perform arbitrary transformations of the joint state of 
all the ions! 



and coherence to permit factorisation of hundred-digit 
numbers. However, it would be fascinating to try a 
quantum algorithm on just a few qubits (4 to 10) and 
thus to observe the principles of quantum information 
processing at work. We will discuss in section ^ meth- 
ods which should allow the number of coherent gate 
operations to be greatly increased. 



8.2 Nuclear magnetic resonance 

The proposal using nuclear magnetic resonance (NMR) 
is illustrated in fig. 13. The quantum processor in this 
case is a molecule containing a 'backbone' of about ten 
atoms, with other atoms such as hydrogen attached 
so as to use up all the chemical bonds. It is the nu- 
clei which interest us. Each has a magnetic moment 
associated with the nuclear spin, and the spin states 
provide the qubits. The molecule is placed in a large 
magnetic field, and the spin states of the nuclei are 
manipulated by applying oscillating magnetic fields in 
pulses of controlled duration. 



To complete the prescription for a quantum computer 
(section ^), we must be able to prepare the initial 
state and measure the final state. The first is possible 
through the methods of optical pumping and laser cool- 
ing, the second through the 'quantum jump' or 'elec- 
tron shelving' measurement technique. All these are 
powerful techniques developed in the atomic physics 
community over the past twenty years. However, the 
combination of all the techniques at once has only been 
achieved in a single experiment, which demonstrated 
preparation, quantum gates, and measurement for just 
a single trapped ion (Monroe et. al 1995b). 

The chief experimental difficulty in the ion trap method 
is to cool the string of ions to the ground state of the 
trap (a sub-microKelvin temperature), and the chief 
source of decoherence is the heating of this motion ow- 
ing to the coupling between the charged ion string and 
noise voltages in the electrodes (Steane 1997, Wineland 
et. al. 1997). It is unknown just how much the heat- 
ing can be reduced. A conservative statement is that 
in the next few years 100 quantum gates could be ap- 
plied to a few ions without losing coherence. In the 
longer term one may hope for an order of magnitude 
increase in both figures. It seems clear that an ion trap 
processor will never achieve sufficient storage capacity 



So far, so good. The problem is that the spin state 
of the nuclei of a single molecule can be neither pre- 
pared nor measured. To circumvent this problem, we 
use not a single molecule, but a cup of liquid contain- 
ing some 10^° molecules! We then measure the aver- 
age spin state, which can be achieved since the average 
oscillating magnetic moment of all the nuclei is large 
enough to produce a detectable magnetic field. Some 
subtleties enter at this point. Each of the molecules in 
the liquid has a very slightly different local magnetic 
field, infiuenced by other molecules in the vicinity, so 
each 'quantum processor' evolves slightly differently. 
This problem is circumvented by the spin-echo tech- 
nique, a standard tool in NMR which allows the effects 
of free evolution of the spins to be reversed, without 
reversing the effect of the quantum gates. However, 
this increases the difficulty of applying long sequences 
of quantum gates. 

The remaining problem is to prepare the initial state. 
The cup of liquid is in thermal equilibrium to be- 
gin with, so the different spin states have occupation 
probabilities given by the Boltzman distribution. One 
makes use of the fact that spin states are close in en- 
ergy, and so have nearly equal occupations initially. 
Thus the density matrix p of the 0(10^") nuclear spins 
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is very close to the identity matrix /. It is the small dif- 
ference A = p — I which can be used to store quantum 
information. Although A is not the density matrix of 
any quantum system, it nevertheless transforms under 
well-chosen field pulses in the same way as a density 
matrix would, and hence can be considered to repre- 
sent an effective quantum computer. The reader is 
referred to Gershenfeld and Chuang (1997) for a de- 
tailed description, including the further subtlety that 
an effective pure state must be distilled out of A by 
means of a pulse sequence which performs quantum 
data compression. 

NMR experiments have for some years routinely 
achieved spin state manipulations and measurements 
equivalent in complexity to those required for quan- 
tum information processing on a few qubits, therefore 
the first few-qubit quantum processors will be NMR 
systems. The method does not scale very well as the 
number of qubits is increased, however. For example, 
with n qubits the measured signal scales as 2"". Also 
the possibility to measure the state is limited, since 
only the average state of many processors is detectable. 
This restricts the ability to apply quantum error correc- 
tion (section ^ , and complicates the design of quantum 
algorithms. 



8.3 High-Q optical cavities 

Both systems we have described permit simple quan- 
tum information processing, but not quantum commu- 
nication. However, in a very high-quality optical cav- 
ity, a strong coupling can be achieved between a single 
atom or ion and a single mode of the electromagnetic 
field. This coupling can be used to apply quantum 
gates between the field mode and the ion, thus opening 
the way to transferring quantum information between 
separated ion traps, via high-Q optical cavities and op- 
tical fibres (Cirac et. al. 1997). Such experiments are 
now being contemplated. The required strong coupling 
between a cavity field and an atom has been demon- 
strated by Brune et. al. (1994), and Turchette et. al. 
(1995). An electromagnetic field mode can also be used 
to couple ions within a single trap, providing a faster 
alternative to the phonon method (Pellizzari et. al. 
1995). 



9 Quantum error correction 

In section ^ we discussed some beautiful quantum al- 
gorithms. Their power only rivals classical computers, 
however, on quite large problems, requiring thousands 
of qubits and billions of quantum gates (with the pos- 
sible exception of algorithms for simulation of physical 
systems). In section ^ we examined some experimen- 
tal systems, and found that we can only contemplate 
'computers' of a few tens of qubits and perhaps some 
thousands of gates. Such systems are not 'computers' 
at all because they are not sufficiently versatile: they 
should at best be called modest quantum information 
processors. Whence came this huge disparity between 
the hope and the reality? 

The problem is that the prescription for the univer- 
sal quantum computer, section |^, is unphysical in its 
fourth requirement. There is no such thing as a perfect 
quantum gate, nor is there such a thing as an isolated 
system. One may hope that it is possible in principle to 
achieve any degree of perfection in a real device, but 
in practice this is an impossible dream. Gates such 
as XOR rely on a coupling between separated qubits, 
but if qubits are coupled to each other, they will un- 
avoidably be coupled to something else as well (Plenio 
and Knight 1996). A rough guide is that it is very 
hard to find a system in which the loss of coherence 
is smaller than one part in a million each time a xOR 
gate is applied. This means the decoherence is roughly 
10'' times too fast to allow factorisation of a 130 digit 
number! It is an open question whether the laws of 
physics offer any intrinsic lower limit to the decoher- 
ence rate, but it is safe to say that it would be sim- 
pler to speed up classical computation by a factor of 
10^ than to achieve such low decoherence in a large 
quantum computer. Such arguments were eloquently 
put forward by Haroche and Raimond (1996). Their 
work, and that of others such as Landauer (1995,1996) 
sounds a helpful note of caution. More detailed treat- 
ments of decoherence in quantum computers are given 
by Unruh (1995), Palma et. al. (1996) and Chuang et. 
al. (1995). Large numerical studies are described by 
Miquel et. al. (1996) and Barenco et. al. (1997). 

Classical computers are reliable not because they are 
perfectly engineered, but because they are insensitive 
to noise. One way to understand this is to examine 
in detail a device such as a flip-flop, or even a humble 
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mechanical switch. Their stabihty is based on a com- 
bination of amphfication and dissipation: a small de- 
parture of a mechanical switch from 'on' or 'off' results 
in a large restoring force from the spring. Amplifiers 
do the corresponding job in a flip-flop. The restoring 
force is not sufficient alone, however: with a conser- 
vative force, the switch would oscillate between 'on' 
and 'off'. It is important also to have damping, sup- 
plied by an inelastic collision which generates heat in 
the case of a mechanical switch, and by resistors in the 
electronic flip-flop. However, these methods arc ndcd 
out for a quantum computer by the fundamental prin- 
ciples of quantum mechanics. The no-cloning theorem 
means amplification of unknown quantum states is im- 
possible, and dissipation is incompatible with unitary 
evolution. 

Such fundamental considerations lead to the widely ac- 
cepted belief that quantum mechanics rules out the 
possibility to stabilize a quantum computer against the 
effects of random noise. A repeated projection of the 
computer's state by well-choscn measurements is not 
in itself sufficient (Berthiaume et. al. 1994, Miquel et. 
al 1997). However, by careful application of informa- 
tion theory one can find a way around this impasse. 
The idea is to adapt the error correction methods of 
classical information theory to the quantum situation. 

Quantum error correction (QEC) was established as 
an important and general method by Steane (1996b) 
and independently Calderbank and Shor (1996). Some 
of the ideas had been introduced previously by Shor 
(1995b) and Steane (1996a). They are related to the 
'entanglement purification' introduced by Bennett el. 
al. (1996a) and independently Deutsch et. al. (1996). 
The theory of QEC was further advanced by Knill 
and Laflamme (1997), Ekert and Macchiavello (1996), 
Bennett et. al. (1996b). The latter paper describes 
the optimal 5-qubit code also independently discov- 
ered by Laflamme et. al. (1996). Gottesman (1996) 
and Calderbank et. al. (1997) discovered a general 
group-theoretic framework, introducing the important 
concept of the stabilizer, which also enabled many 
more codes to be found (Calderbank et. al. 1996, 
Steane 1996cd). Quantum coding theory reached a 
further level of maturity with the discovery by Shor 
and Laflamme (1997) of a quantum analogue to the 
MacWilliams identities of classical coding theory. 



QEC uses networks of quantum gates and measure- 
ments, and at first is was not clear whether these net- 
works had themselves to be perfect in order for the 
method to work. An important step forward was taken 
by Shor (1996) and Kitaev (1996) who showed how 
to make error correcting networks tolerant of errors 
within the network. In other words, such 'fault tol- 
erant' networks remove more noise than they intro- 
duce. Shor's methods were generalised by DiVincenzo 
and Shor (1996) and made more efficient by Steane 
(1997a, c). Knill and Laflamme (1996) introduced the 
idea of 'concatenated' coding, which is a recursive cod- 
ing method. It has the advantage of allowing arbitrar- 
ily long quantum computations as long as the noise 
per elementary operation is below a finite threshold, 
at the cost of inefficient use of quantum memory (so 
requiring a large computer). This threshold result was 
derived by several authors (Knill et al 1996, Aharonov 
and Ben-Or 1996, Gottesman et. al. 1996). Further 
fault tolerant methods are described by Knill et. al. 
(1997), Gottesman (1997), Kitaev (1997). 

The discovery of QEC was roughly simultaneous with 
that of a related idea which also permits noise-free 
transmission of quantum states over a noisy quantum 
channel. This is the 'entanglement purification' (Ben- 
nett et. al. 1996a, Deutsch et. al. 1996). The cen- 
tral idea here is for Alice to generate many entangled 
pairs of qubits, sending one of each pair down the noisy 
channel to Bob. Bob and Alice store their qubits, and 
perform simple parity checking measurements: for ex- 
ample, Bob's performs XOR between a given qubit and 
the next he receives, then measures just the target 
qubit. Alice does the same on her qubits, and they 
compare results. If they agree, the unmeasured qubits 
are (by chance) closer than average to the desired state 
1 00) + |11). If they disagree, the qubits are rejected. 
By recursive use of such checks, a few 'good' entangled 
pairs are distilled out of the many noisy ones. Once in 
possession of a good entangled state, Alice and Bob can 
communicate by teleportation. A thorough discussion 
is given by Bennett et. al. (1996b). 

Using similar ideas, with important improvements, van 
Enk et. al. (1997) have recently shown how quan- 
tum information might be reliably transmitted between 
atoms in separated high-Q optical cavities via imper- 
fect optical fibres, using imperfect gate operations. 
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I will now outline the main principles of QEC. 

Let us write down the worst possible thing which could 
happen to a single qubit: a completely general interac- 
tion between a qubit and its environment is 

|e,)(a|0) + 6|l)) ^ a(coo|eoo)|0)+coi|eoi)|l)) 
+ &(cio|eio)|l)+cii|eii)|0)) (42) 

where |e...) denotes states of the environment and c... 
are coefficients depending on the noise. The first sig- 
nificant point is to notice that this general interaction 
can be written 

|e,) 10) ^ (|e/) / + \ex) X + |ey) Y + \ez) Z) \4>) (43) 

where |0) ~ a|0) + fe|l) is the initial state of the 
qubit, and |e/) = cqo |eoo)+cio |eio), \ex) = coi |eoi) + 
cii |eii), and so on. Note that these environment states 
are not necessarily normalised. Eq. (^) tells us that 
we have essentially three types of error to correct on 
each qubit: X, Y and Z errors. These are 'bit flip' {X) 
errors, phase errors {Z) or both (Y = XZ). 

Suppose our computer q is to manipulate k qubits of 
quantum information. Let a general state of the k 
qubits be {(jj). We first make the computer larger, in- 
troducing a further n — k qubits, initially in the state 
|0). Call the enlarged system qc. An 'encoding' oper- 
ation is performed: E{\(j)) |0)) — \<j)E)- Now, let noise 
affect the n qubits of qc. Without loss of generality, 
the noise can be written as a sum of 'error operators' 
M, where each error operator is a tensor product of 
n operators (one for each qubit), taken from the set 
{I,X,Y,Z}. For example M = 71X2/3^4^5X6/7 for 
the case n — 7. A general noisy state is 



(Q) only contains G S, then the joint state of en- 
vironment, qc and a after syndrome extraction is 
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Now we introduce even more qubits: a further n — k, 
prepared in the state |0)^. This additional set is 
called an 'ancilla'. For any given encoding E, there 
exists a syndrome extraction operation A, operating 
on the joint system of qc and a. whose effect is 
A{M, \(I)e) |0) J = {Ms \(j)E)) |s), V Ms e S. The set S 
is the set of correctable errors, which depends on the 
encoding. In the notation |s)^, s is just a binary num- 
ber which indicates which error operator Ms we are 
dealing with, so the states \s)^ are mutually orthogo- 
nal. Suppose for simplicity that the general noisy state 
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We now measure the ancilla state, and something 
rather wonderful happens: the whole state collapses 
onto jcs) (Ms \4>e)) |s)a, for some particular value of s. 
Now, instead of general noise, we have just one partic- 
ular error operator Alg to worry about. Furthermore, 
the measurement tells us the value s (the 'error syn- 
drome') from which we can deduce which Mg we have! 
Armed with this knowledge, we apply M^^ to qc by 
means of a few quantum gates {X, Z or Y), thus pro- 
ducing the final state le^) \(I>e) In other words, we 
have recovered the noise-free state of qcl The final en- 
vironment state is immaterial, and we can re-prepare 
the ancilla in |0)^ for further use. 

The only assumption in the above was that the noise in 
eq. (jij) only contains error operators in the correctable 
set S. In practice, the noise includes both members and 
non-members of S, and the important quantity is the 
probability that the state collapses onto a correctable 
one when the syndrome is extracted. It is here that the 
theory of error-correcting codes enters in: our task is to 
find encoding and extraction operations E, A such that 
the set S of correctable errors includes all the errors 
most likely to occur. This is a very difficult problem. 

It is a general truth that to permit efficient stabiliza- 
tion against noise, we have to know something about 
the noise we wish to suppress. The most obvious quasi- 
realistic assumption is that of uncorrelated stochastic 
noise. That is, at a given time or place the noise might 
have any effect, but the effects on different qubits, or 
on the same qubit at different times, are uncorrelated. 
This is the quantum equivalent of the binary symet- 
ric channel, section 2.3. By assuming uncorrelated 



stochastic noise we can place all possible error oper- 
ators M in a heirarchy of probability: those affecting 
few qubits (i.e. only a few terms in the tensor product 
are different from /) are most likely, while those af- 
fecting many qubits at once are unlikely. Our aim will 
be to find quantum error correcting codes (QECCs) 
such that all errors affecting up to t qubits will be cor- 
rectable. Such a QECC is termed a 't-error correcting 
code'. 

The simplest code construction (that discovered by 
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Calderbank and Shor and Steane) goes as follows. First 
we notice that a classical error correcting code, such as 
the Hamming code shown in table 1, can be used to 
correct X errors. The proof relies on eq. ( p^ which 
permits the syndrome extraction A to produce an an- 
cilla state | s) which depends only on the error Ms and 
not on the computer's state This suggests that 

we store k quantum bits by means of the 2*^ mutually 
orthogonal n-qubit states where the binary num- 
ber z is a member of a classical error correcting code 
C, see section 2.4. This will not allow correction of 
Z errors, however. Observe that since Z = HXH, the 
correction of Z errors is equivalent to rotating the state 
of each qubit by H, correcting X errors, and rotating 
back again. This rotation is called a Hadamard trans- 
form; it is just a change in basis. The next ingredient is 
to notice the following special property (Steane 1996a): 



(46) 



iec 



where H = H1H2H3 ■ ■ ■ Hn- In words, this says that 
if we make a quantum state by superposing all the 
members of a classical error correcting code C, then 
the Hadamard-transformed state is just a superposition 
of all the members of the dual code C^. From this 
it follows, after some further steps, that it is possible 
to correct both X and Z errors (and therefore also Y 
errors) if we use quantum states of the form given in 
eq. (^), as long as both C and C-^ are good classical 
error correcting codes, i.e. both have good correction 
abilities. 

The simplest QECC constructed by the above recipe 
requires n — 7 qubits to store a single (k = 1) qubit 
of useful quantum information. The two orthogonal 
states required to store the information are built from 
the Hamming code shown in table 1: 

\Qe) = 10000000) -t- IIOIOIOI) + 10110011) -I- 11100110) 
+ 10001111) -I- 11011010) + 10111100) -f IllOlOOl) (47) 

\Ie) = 11111111) + 10101010) + llOOllOO) + 10011001) 
-t- 11110000) + 10100101) + 11000011) + 10010110) (48) 

Such a QECC has the following remarkable property. 
Imagine I store a general (unknown) state of a single 
qubit into a spin state a |0_e) + 6|1_b) of 7 spin-half 
particles. I then allow you to do anything at all to 
any one of the 7 spins. I could nevertheless extract 
my original qubit state exactly. Therefore the large 



perturbation you introduced did nothing at all to the 
stored quantum information! 

More powerful QECCs can be obtained from more pow- 
erful classical codes, and there exist quantum code con- 
structions more efficient than the one just outlined. 
Suppose we store k qubits into n. There are in ways 
for a single qubit to be in error, since the error might 
be one of X, Y or Z. The number of syndrome bits 
is n — fc, so if every single-qubit error, and the error- 
free case, is to have a different syndrome, we require 
2n-k > + 1. For k = 1 this lower limit is filled ex- 
actly by n = 5 and indeed such a 5-qubit single-error 
correcting code exists (Laflamme et. al. 1996, Bennett 
et. al. 1996b). 

More generally, the remarkable fact is that for fixed 
fc/n, codes exist for which t/n is bounded from below 
as n ^ 00 (Calderbank and Shor 1995, Steane 1996b, 
Calderbank et. al. 1997). This leads to a quantum 
version of Shannon's theorem (section 2.4), though an 
exact definition of the capacity of a quantum channel 
remains unclear (Schumacher and Nielsen 1996, Bar- 
num et. al. 1996, Lloyd 1997, Bennett et. al. 1996b, 
Knill and Laflamme 1997a). For finite n, the probabil- 
ity that the noise produces uncorrectable errors scales 
roughly as {ney~^^, where e ^ 1 is the probability of 
an arbitrary error on each qubit. This represents an 
extremely powerful noise suppression. We need to be 
able to reduce e to a sufficiently small value by pas- 
sive means, and then QEC does the rest. For exam- 
ple, consider the case e ~ O.OOl. With n = 23 there 
exisits a code correcting all t — 3-qubit errors (Go- 
lay 1949, Steane 1996c). The probability that uncor- 
rectable noise occurs is ^ 0.023^ ~ 3 x 10~^, thus the 
noise is suppressed by more than three orders of mag- 
nitude. 

So far I have described QEC as if the ancilla and the 
many quantum gates and measurements involved were 
themselves noise-free. Obviously we must drop this as- 
sumption if we want to form a realistic impression of 
what might be possible in quantum computing. Shor 
(1996) and Kitaev (1996) discovered ways in which all 
the required operations can be arranged so that the cor- 
rection suppresses more noise than it introduces. The 
essential ideas are to verify states wherever possible, 
to restrict the propagation of errors by careful network 
design, and to repeat the syndrome extraction: for each 
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group of qubits qc, the syndrome is extracted several 
times and qc is only corrected once t+l mutually con- 
sistent syndromes are obtained. Fig. 14 illustrates a 
fault-tolerant syndrome extraction network, i.e. one 
which restricts the propagation of errors. Note that a 
is verified before it is used, and each qubit in qc only 
interacts with one qubit in a. 

In fault-tolerant computing, we cannot apply arbitrary 
rotations of a logical qubit, eq. (|33|), in a single step. 
However, particular rotations through irrational angles 
can be carried out, and thus general rotations are gen- 
erated to an arbitrary degree of precision through repe- 
tition. Note that the set of computational gates is now 
discrete rather than continuous. 

Recently the requirements for reliable quantum com- 
puting using fault-tolerant QEC have been estimated 
(PreskiU 1997, Steane 1997c). They are formidable. 
For example, a computation beyond the capabilities of 
the best classical computers might require 1000 qubits 
and 10^° quantum gates. Without QEC, this would 
require a noise level of order 10~^^ per qubit per gate, 
which we can rule out as impossible. With QEC, the 
computer would have to be made ten or perhaps one 
hundred times larger, and many thousands of gates 
would be involved in the correctors for each elemen- 
tary step in the computation. However, much more 
noise could be tolerated: up to about 10^^ per qubit 
per gate (i.e. in any of the gates, including those in 
the correctors) (Steane 1997c). This is daunting but 
possible. 

The error correction methods briefly described here are 
not the only type possible. If we know more about 
the noise, then humbler methods requiring just a few 
qubits can be quite powerful. Such a method was pro- 
posed by Cirac et. al. (1996) to deal with the principle 
noise source in an ion trap, which is changes of the mo- 
tional state during gate operations. Also, some joint 
states of several qubits can have reduced noise if the en- 
vironment affects all qubits together. For example the 
two states |01) ± |10) are unchanged by environmental 
coupHng of the form |eo) Iih + |ei) XiX2. (Palma et. 
al. 1996, Chuang and Yamamoto 1997). Such states 
offer a calm eye within the storm of decoherence, in 
which quantum information can be manipulated with 
relative impunity. A practical computer would proba- 
bly use a combination of methods. 



10 Discussion 

The idea of 'Quantum Computing' has fired many 
imaginations simply because the words themselves sug- 
gest something strange but powerful, as if the physi- 
cists have come up with a second revolution in informa- 
tion processing to herald the next millenium. This is a 
false impression. Quantum computing will not replace 
classical computing for similar reasons that quantum 
physics does not replace classical physics: no one ever 
consulted Heisenberg in order to design a house, and 
no one takes their car to be mended by a quantum 
mechanic. If large quantum computers are ever made, 
they will be used to address just those special tasks 
which benefit from quantum information processing. 

A more lasting reason to be excited about quantum 
computing is that it is a new and insightful way to 
think about the fundamental laws of physics. The 
quantum computing community remains fairly small 
at present, yet the pace of progress has been fast and 
accelerating in the last few years. The ideas of clas- 
sical information theory seem to fit into quantum me- 
chanics like a hand into a glove, giving us the feel- 
ing that we are uncovering something profound about 
Nature. Shannon's noiseless coding theorem leads to 
Schumacher and Josza's quantum coding theorem and 
the significance of the qubit as a useful measure of in- 
formation. This enables us to keep track of quantum 
information, and to be confident that it is indepen- 
dent of the details of the system in which it is stored. 
This is necessary to underpin other concepts such as 
error correction and computing. The classical theory 
of error correction leads to the discovery of quantum 
error correction. This allows a physical process pre- 
viously thought to be impossible, namely the almost 
perfect recovery of a general quantum state, undoing 
even irreversible processes such as relaxation by spon- 
taneous emission. For example, during a long error- 
corrected quantum computation, using fault-tolerant 
methods, every qubit in the computer might decay a 
million times and yet the coherence of the quantum 
information be preserved. 

Hubert's questions regarding the logical structure of 
mathematics encourage us to ask a new type of 
question about the laws of physics. In looking at 
Schrodinger's equation, we can neglect whether it is 
describing an electron or a planet, and just ask about 
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the state manipulations it permits. The language of 
information and computer science enables us to frame 
such questions. Even such a simple idea as the quan- 
tum gate, the cousin of the classical binary logic gate, 
turns out to be very useful, because it enables us 
to think clearly about quantum state manipulations 
which would otherwise seem extremely complicated or 
impractical. Such ideas open the way to the design of 
quantum algorithms such as those of Shor, Grover and 
Kitacv. These show that quantum mechanics allows 
information processing of a kind ruled out in classical 
physics. It relies on the propagation of a quantum state 
through a huge (exponentially largo) number of dimen- 
sions of Hilbert space. The computation result arises 
from a controlled interference among many computa- 
tional paths, which even after wo have examined the 
mathematical description, still seems wonderful and 
surprising. 

The intrinsic difficulty of quantum computation lies in 
the sensitivity of large-scale interference to noise and 
imprecision. A point often raised against the quantum 
computer is that it is essentially an analogue rather 
than a digital device, and has many limitations as a re- 
sult. This is a misconception. It is true that any quan- 
tum system has a continuous state space, but so has 
any classical system, including the circuits of a digital 
computer. The fault-tolerant methods used to permit 
error correction in a quantum computer restrict the set 
of quantum gates to a discrete set, therefore the 'legal' 
states of the quantum computer are discrete, just as 
in a classical digital computer. The really important 
difference between analogue and digital computing is 
that to increase the precision of a result arrived at by 
analogue means, one must re-engineer the whole com- 
puter, whereas with digital methods one need merely 
increase the number of bits and operations. The fault- 
tolerant quantum computer has more in common with 
a digital than an analogue device. 

Shor's algorithm for the factorisation problem stimu- 
lated a lot of interest in part because of the connection 
with data encryption. However, I feel that the signifi- 
cance of Shor's algorithm is not primarily in its possible 
use for factoring large integers in the distant future. 
Rather, it has acted as a stimulus to the field, prov- 
ing the existence of a powerful new type of computing 
made possible by controlled quantum evolution, and 
exhibiting some of the new methods. At present, the 



most practically significant achievement in the general 
area of quantum information physics is not in comput- 
ing at all, but in quantum key distribution. 

The title 'quantum computer' will remain a misnomer 
for any experimental device realised in the next twenty 
years. It is an abuse of language to call even a pocket 
calculator a 'computer', because the word has come to 
be reserved for general-purpose machines which more 
or less realise Turing's concept of the Universal Ma- 
chine. The same ought to be true for quantum comput- 
ers if we do not want to mislead people. However, small 
quantum information processors may serve useful roles. 
For example, concepts learned from quantum informa- 
tion theory may permit the discovery of useful new 
spectroscopic methods in nuclear magnetic resonance. 
Quantum key distribution could be made more secure, 
and made possible over larger distances, if small 'relay 
stations' c;ould be built which applied purification or 
error correction methods. The relay station could be 
an ion trap combined with a high-Q cavity, which is 
realisable with current technology. It will surely not 
be long before a quantum state is teleported from one 
laboratory to another, a very exciting prospect. 

The great intrinsic value of a large quantum computer 
is offset by the difficulty of making one. However, few 
would argue that this prize does not at least merit a lot 
of effort to find out just how unattainable, or hopefully 
attainable, it is. One of the chief uses of a processor 
which could manipulate a few quantum bits may be to 
help us better understand decoherence in quantum me- 
chanics. This will be amenable to experimental inves- 
tigation during the next few years: rather than waiting 
in hope, there is useful work to be done now. 

On the theoretical side, there are two major open ques- 
tions: the nature of quantum algorithms, and the lim- 
its on reliability of quantum computing. It is not yet 
clear what is the essential nature of quantum comput- 
ing, and what general class of computational problem is 
amenable to efficient solution by quantum methods. Is 
there a whole mine of useful quantum algorithms wait- 
ing to be delved, or will the supply dry up with the 
few nuggets we have so far discovered? Can significant 
computational power be achieved with less than 100 
qubits? This is by no means ruled out, since it is hard 
to simulate even 20 qubits by classical means. Concern- 
ing reliability, great progress has been made, so that we 
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can now be cautiously optimistic that quantum com- 
puting is not an impossible dream. We can identify re- 
quirements sufficient to guarantee reliable computing, 
involving for example uncorrelated stochastic noise of 
order 10~^ per gate, and a quantum computer a hun- 
dred times larger than the logical machine embedded 
within it. However, can quantum decoherence be re- 
lied upon to have the properties assumed in such an 
estimate, and if not then can error correction methods 
still be found? Conversely, once wc know more about 
the noise, it may be possible to identify considerably 
less taxing requirements for reliable computing. 

To conclude with, I would like to propose a more wide- 
ranging theoretical task: to arrive at a set of principles 
like energy and momentum conservation, but which ap- 
ply to information, and from which much of quantum 
mechanics could be derived. Two tests of such ideas 
would be whether the EPR-Bcll correlations thus be- 
came transparent, and whether they rendered obvious 
the proper use of terms such as 'measurement' and 
'knowledge'. 

I hope that quantum information physics will be recog- 
nised as a valuable part of fundamental physics. The 
quest to bring together Turing machines, information, 
number theory and quantum physics is for me, and 
I hope will be for readers of this review, one of the 
most fascinating cultural endeavours one could have 
the good fortune to encounter. 

I thank the Royal Society and St Edmund Hall, Oxford, 
for their support. 
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Fig. 1. Maxwell's demon. In this illustration the demon sets up a pressure difference by only raising the 
partition when more gas molecules approach it from the left than from the right. This can be done in a 
completely reversible manner, as long as the demon's memory stores the random results of its observations of 
the molecules. The demon's memory thus gets hotter. The irreversible step is not the acquisition of information, 
but the loss of information if the demon later clears its memory. 
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Quantum 
Mechanics 




Fig. 2. Relationship between quantum mechanics and information theory. This diagram is not intended to 
be a definitive statement, the placing of entries being to some extent subjective, but it indicates many of the 
connections discussed in the article. 
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Fig. 4. The standard communication channel ("the information theorist's coat of arms"). The source (Ahcc) 
produces information which is manipulated ('encoded') and then sent over the channel. At the receiver (Bob) 
the received values are 'decoded' and the information thus extracted. 
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k/n 

Fig. 5. Illustration of Shannon's theorem. Alice sends n — 100 bits over a noisy channel, in order to communicate 
k bits of information to Bob. The figure shows the probability that Bob interprets the received data correctly, as 
a function of k/n, when the error probability per bit isp = 0.25. The channel capacity is C = 1 — i/(0.25) ~ 0.19. 
Dashed line: Alice sends each bit repeated n/k times. Full line: Alice uses the best linear error-correcting code 
of rate k/n. The dotted line gives the performance of error-correcting codes with larger n, to illustrate Shannon's 
theorem. 
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Fig. 6. A classical computer can be built from a network of logic gates. 
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Fig. 7. The Turing Machine. This is a conceptual mechanical device which can be shown to be capable of 
efficiently simulating all classical computational methods. The machine has a finite set of internal states, and 
a fixed design. It reads one binary symbol at a time, supplied on a tape. The machine's action on reading a 
given symbol s depends only on that symbol and the internal state G. The action consists in overwriting a new 
symbol s' on the current tape location, changing state to G' , and moving the tape one place in direction d (left 
or right). The internal construction of the machine can therefore be specified by a finite fixed list of rules of 
the form (s,G s',G',d). One special internal state is the 'halt' state: once in this state the machine ceases 
further activity. An input 'programme' on the tape is transformed by the machine into an output result printed 
on the tape. 
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Fig. 8. Example 'quantum network.' Each horizontal line represents one qubit evolving in time from left to 

right. A symbol on one line represents a singlc-qubit gate. Symbols on two qubits connected by a vertical 
line represent a two-qubit gate operating on those two qubits. The network shown carries out the operation 
Xiif2XORi^3 The ® symbol represents X (not), the encircled H is the H gate, the filled circle linked to ® 
is controUed-NOT. 
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Fig. 9. Basic quantum communication concepts. The figure gives quantum networks for (a) dense coding, (b) 
teleportation and (c) data compression. The spatial separation of Alice and Bob is in the vertical direction; 
time evolves from left to right in these diagrams. The boxes represent measurements, the dashed lines represent 
classical information. 
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Fig. 10. Quantum network for Shor's period-finding algorithm. Here each horizontal line is a quantum register 
rather than a single qubit. The circles at the left represent the preparation of the input state |0). The encircled 
ft represents the Fourier transform (see text), and the box linking the two registers represents a network to 
perform Uf. The algorithm finishes with a measurement of the x regisiter. 
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Fig. 11. Evolution of the quantum state in Shor's algorithm. The quantum state is indicated schematically by 
identifying the non-zero contributions to the superposition. Thus a general state ^Cx,y \x) \y) is indicated by 
placing a filled square at all those coordinates {x,y) on the diagram for which Cx.y 7^ 0. (a) cq. (|3^). (b) eq. 
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Fig. 12. Ion trap quantum information processor. A string of singly-charged atoms is stored in a linear ion 
trap. The ions are separated by ^ 20 /im by their mutual repulsion. Each ion is addressed by a pair of laser 
beams which coherently drive both Raman transitions in the ions, and also transitions in the state of motion 
of the string. The motional degree of freedom serves as a single-qubit 'bus' to transport quantum information 
among the ions. State preparation is by optical pumping and laser cooling; readout is by electron shelving and 
resonance fluorescence, which enables the state of each ion to be measured with high signal to noise ratio. 
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Fig. 13. Bulk nuclear spin resonance quantum information processor. A liquid of ~ 10^" 'designer' molecules 
is placed in a sensitive magnetometer, which can both generate oscillating magnetic fields and also detect the 
precession of the mean magnetic moment of the liquid. The situation is somewhat like having 10^'^ independent 
processors, but the initial state is one of thermal equilibrium, and only the average final state can be detected. 
The quantum information is stored and manipulated in the nuclear spin states. The spin state energy levels of 
a given nucleus are influenced by neighbouring nuclei in the molecule, which enables XOR gates to be applied. 
They are little influenced by anything else, owing to the small size of a nuclear magnetic moment, which means 
the inevitable dephasing of the processors with respect to each other is relatively slow. This dephasing can be 
undone by 'spin echo' methods. 
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Fig. 14. Fault tolerant syndrome extraction, for the QECC given in equations (|47|) , (|48D . The upper 7 qubits 
are qc, the lower are the ancilla a. All gates, measurements and free evolution are assumed to be noisy. Only 
H and 2-qubit XOR gates are used; when several XORS have the same control or target bit they are shown 
superimposed, NB this is a non-standard notation. The first part of the network, up until the 7 H gates, 
prepares a in lO^;), and also verifies a: a small box represents a single-qubit measurement. If any measurement 
gives 1, the preparation is restarted. The H gates transform the state of a to \0e) + |1_e)- Finally, the 7 xOR 
gates between qc and a carry out a single xOR in the encoded basis {\0e) , \^e)}- This operation carries X errors 
from qc into a, and Z errors from a into qc. The X errors in qc can be deduced from the result of measuring 
a. A further network is needed to identify Z errors. Such correction never makes qc completely noise-free, but 
when applied between computational steps it reduces the accumulation of errors to an acceptable level. 
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0000 


10 


0000000 


0001 


000 


1010101 


0010 


001 


0110011 


0011 


11000 


1100110 


0100 


010 


0001111 


0101 


11001 


1011010 


0110 


11010 


0111100 


0111 


1111000 


1101001 


1000 


oil 


1111111 


1001 


11011 


0101010 


1010 


11100 


1001100 


1011 


mill 


0011001 
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11101 


1110000 


1101 


111110 


0100101 


1110 


111101 


1000011 


1111 


1111001 


0010110 



Tabic 1: Huffman and Hamming codes. The left column shows the sixteen possible 4-bit messages, the other 
columns show the encoded version of each message. The Huffman code is for data compression: the most likely 
messages have the shortest encoded forms; the code is given for the case that each message bit is three times 
more likely to be zero than one. The Hamming code is an error correcting code: every codeword differs from 
all the others in at least 3 places, therefore any single error can be corrected. The Hamming code is also linear: 
all the words are given by linear combinations of 1010101, 0110011, 0001111, 1111111. They satisfy the parity 
checks 1010101, 0110011, 0001111. 
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